HTML entities are one of those topics developers learn once, use constantly, and somehow never fully nail down. When do you actually need them? What's the difference between & and &? And why does skipping them in the wrong place turn a web form into an attack surface?

This article cuts through the noise. You'll get the five characters you must always encode, a full reference table to bookmark, a clear explanation of how encoding blocks XSS, and the context-specific rules that most tutorials skip.

Need to encode or decode HTML entities right now?
Our free HTML Encoder & Decoder handles named entities, numeric codes, live preview, and 80+ entity quick-insert — all in your browser.
Open HTML Encoder

What an HTML Entity Actually Is

An HTML entity is a text sequence that represents a single character. Every entity starts with & and ends with ;. Between those delimiters is either a name (&) or a number (& or &).

The browser's HTML parser recognises these sequences and replaces them with the corresponding character before rendering. So &lt; in source becomes a visible < on screen — it's shown as text, not treated as a tag opener.

There are two reasons to use entities:

  1. Safety — to prevent characters with HTML meaning from being parsed as markup when they should display as text.
  2. Convenience — to insert characters that are hard or impossible to type directly, like © or —.

The Five Characters You Must Always Encode

These five characters have structural meaning in HTML. Any time they appear in text content or attribute values that come from user input or an external source, they must be encoded. Skipping even one is a path to broken markup or an XSS hole.

&&amp;
<&lt;
>&gt;
"&quot;
'&apos;

A quick breakdown of why each one matters:

  • & starts every entity sequence. Unencoded ampersands in text cause parsing ambiguity and invalid HTML. In URLs inside href attributes, an unencoded & between query parameters is treated as an entity opener, not a separator — use &amp;.
  • < opens HTML tags. Any unencoded < in content can cause the parser to start reading a tag, breaking layout or creating an injection point.
  • > closes HTML tags. Less dangerous than < in practice, but still needs encoding for spec compliance and to prevent edge-case parser issues.
  • " terminates double-quoted attribute values. Inside href="..." or any attribute, an unencoded " ends the value early.
  • ' terminates single-quoted attribute values. Same issue for href='...'.
Real-world example

A URL with query parameters like ?name=Alice&city=Berlin inside an href attribute must be written as href="?name=Alice&amp;city=Berlin". Without encoding the &, validators flag invalid HTML and some parsers misread the parameter boundary.

Named vs Numeric Entities

Every HTML entity can be written two ways: by name or by Unicode code point number.

Three ways to write the same character (©)
Named:    &copy;        ← readable, human-friendly
Decimal:  &#169;        ← Unicode code point in base 10
Hex:      &#xA9;        ← Unicode code point in base 16

All three render as: ©

Named entities are easier to read and write. You see &mdash; and immediately know it's an em dash. The downside is that not every character has a name — only a few hundred do, compared to over a million Unicode code points.

Numeric entities work for any Unicode character, named or not. If you need a specific emoji or obscure symbol, use &#[codepoint];. They're less readable but universally applicable.

A note on &apos;

&apos; for the apostrophe was defined in XML and XHTML but was technically not part of HTML4. It's fully supported in HTML5 and all modern browsers. Use it freely — just be aware that very old or obscure parsers from pre-2008 might not recognise it (use &#39; as a numeric fallback in those cases).

How HTML Encoding Stops XSS

Cross-site scripting (XSS) happens when an attacker's JavaScript ends up running in another user's browser. The most common path: a user submits text containing <script> tags or event handler attributes, and the server returns that text inside an HTML page without encoding it.

✗ Unencoded — dangerous
User input:
<script>document.cookie</script>

Inserted into page:
<p><script>document.cookie</script></p>

Result: script executes, steals cookies
✓ Encoded — safe
User input (same):
<script>document.cookie</script>

After encoding:
<p>&lt;script&gt;document.cookie
&lt;/script&gt;</p>

Result: displays as text, does nothing

Encoding works because &lt;script&gt; is text content — the browser displays the characters <script> on screen rather than creating a script element. The attack is neutralised at the rendering stage.

Encoding is context-dependent

HTML encoding alone is not sufficient in every context:

  • JavaScript strings — data inside <script> tags needs JavaScript escaping (\", \', \\), not HTML entities
  • CSS values — data inside style attributes needs CSS escaping
  • URL parameters — values in href/src attributes need URL encoding (%20, etc.) in addition to HTML encoding of the surrounding attribute
  • JavaScript event handlers — avoid putting untrusted data directly in onclick, onmouseover, etc.

When You Don't Need to Encode

Modern UTF-8 HTML documents served with <meta charset="UTF-8"> can contain most Unicode characters directly — accented letters, emoji, currency symbols, math operators — without any encoding. You don't need to write &eacute; for é if your document is UTF-8 and your text editor saves in UTF-8.

The only characters you always must encode, regardless of charset, are the five structural ones: &, <, >, ", and '.

Use named entities for these cases:

  • Your CMS or template system mangles certain characters on save
  • You're working in a legacy ASCII-only environment
  • The character is difficult to type and the entity name is more readable (&mdash; for —)
  • You want explicit documentation that a space is intentionally non-breaking (&nbsp;)

Typography Entities Worth Knowing

These are the most useful typography-related entities — the ones that separate professional copy from sloppy copy.

&mdash;
&ndash;
"&ldquo;
"&rdquo;
'&lsquo;
'&rsquo;
&hellip;
&nbsp;

Quick usage rules:

  • Em dash (—) is used for a strong parenthetical break or interruption — like this — with no spaces on either side in American style, or with thin spaces in British style.
  • En dash (–) is used for ranges (pages 12–18, 2020–2026) and as a minus sign in text.
  • Curly quotes (" " ' ') are the typographically correct form. Straight quotes (" and ') are typewriter artifacts. Use curly quotes in published content.
  • Ellipsis (…) is a single character, not three periods. It spaces differently and doesn't break across lines.
  • Non-breaking space (&nbsp;) prevents a line break between two words. Use it for measurements (10&nbsp;kg), titles (Dr.&nbsp;Smith), and units.

Full Entity Reference Table

The entities developers actually use — grouped by category. Use our HTML Encoder to quickly encode any of these into your markup.

CharNamedNumericDescription
Essential — always encode
&&amp;&#38;Ampersand
<&lt;&#60;Less-than sign
>&gt;&#62;Greater-than sign
"&quot;&#34;Double quotation mark
'&apos;&#39;Apostrophe / single quote
Spaces & punctuation
&nbsp;&#160;Non-breaking space
&mdash;&#8212;Em dash
&ndash;&#8211;En dash
&hellip;&#8230;Horizontal ellipsis
&bull;&#8226;Bullet point
«&laquo;&#171;Left double angle quote
»&raquo;&#187;Right double angle quote
Typographic quotes
"&ldquo;&#8220;Left double quotation mark
"&rdquo;&#8221;Right double quotation mark
'&lsquo;&#8216;Left single quotation mark
'&rsquo;&#8217;Right single quotation mark (apostrophe)
Symbols & intellectual property
©&copy;&#169;Copyright sign
®&reg;&#174;Registered trademark
&trade;&#8482;Trade mark sign
Currency
&euro;&#8364;Euro sign
£&pound;&#163;Pound sterling
¥&yen;&#165;Yen / Yuan sign
¢&cent;&#162;Cent sign
Math & science
°&deg;&#176;Degree sign
±&plusmn;&#177;Plus-minus sign
×&times;&#215;Multiplication sign
÷&divide;&#247;Division sign
½&frac12;&#189;Vulgar fraction one half
¼&frac14;&#188;Vulgar fraction one quarter
¾&frac34;&#190;Vulgar fraction three quarters
&infin;&#8734;Infinity
&radic;&#8730;Square root
&sum;&#8721;N-ary summation
π&pi;&#960;Greek small letter pi
Arrows
&rarr;&#8594;Rightward arrow
&larr;&#8592;Leftward arrow
&uarr;&#8593;Upward arrow
&darr;&#8595;Downward arrow
&harr;&#8596;Left right arrow
Card suits
&spades;&#9824;Black spade suit
&hearts;&#9829;Black heart suit
&diams;&#9830;Black diamond suit
&clubs;&#9827;Black club suit

Using Entities in Code

Here's how to handle HTML encoding in the languages and frameworks developers use most in 2026.

JavaScript (vanilla)

Manual HTML encode in plain JavaScript
function htmlEncode(str) {
  return str
    .replace(/&/g, '&amp;')
    .replace(//g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&apos;');
}

// Use it before inserting untrusted content
element.innerHTML = htmlEncode(userInput);
Prefer textContent over innerHTML

When you just need to display text, assign to element.textContent rather than element.innerHTML. The browser handles encoding automatically — no manual escaping needed, and there's no risk of accidentally creating a script context.

JavaScript (decode using the DOM)

Decode HTML entities via a temporary element
function htmlDecode(str) {
  const el = document.createElement('textarea');
  el.innerHTML = str;
  return el.value;
}

htmlDecode('&lt;p&gt;Hello &amp; welcome&lt;/p&gt;')
// → '<p>Hello & welcome</p>'

PHP

PHP htmlspecialchars and htmlentities
// Encode the 5 essential characters
htmlspecialchars($str, ENT_QUOTES | ENT_HTML5, 'UTF-8');

// Encode ALL named entities (rarely needed in UTF-8 docs)
htmlentities($str, ENT_QUOTES | ENT_HTML5, 'UTF-8');

// Decode
htmlspecialchars_decode($encoded, ENT_QUOTES | ENT_HTML5);
html_entity_decode($encoded, ENT_QUOTES | ENT_HTML5, 'UTF-8');

Python

Python html module
import html

# Encode (escapes & < > " ')
html.escape('<p>Hello & World</p>')
# → '&lt;p&gt;Hello &amp; World&lt;/p&gt;'

# Encode without quoting apostrophes
html.escape(s, quote=False)

# Decode (handles named, decimal, and hex entities)
html.unescape('&lt;p&gt;Hello &amp; World&lt;/p&gt;')
# → '<p>Hello & World</p>'

Template engines (React, Vue, Django, Rails)

Most modern template systems auto-escape output by default:

Framework defaults
// React — JSX auto-encodes
<p>{userInput}</p>   ← safe, encoded automatically
<p dangerouslySetInnerHTML={{__html: userInput}}/>  ← UNSAFE

// Vue — double curly auto-encodes
{{ userInput }}   ← safe
v-html="userInput"  ← UNSAFE

// Django templates — auto-escape by default
{{ user_input }}   ← safe
{{ user_input|safe }}  ← UNSAFE — only for trusted HTML

// Rails ERB — auto-escape by default
<%= user_input %>   ← safe
<%= raw user_input %>  ← UNSAFE

The pattern is consistent: the default interpolation syntax is safe; the "raw" or "unsafe HTML" bypass is the thing to avoid with untrusted data.

The &nbsp; Misuse Problem

Non-breaking space is probably the most misused entity. Developers sometimes add multiple &nbsp; characters in a row to create visual indentation or padding. This is a bad pattern for several reasons:

  • Accessibility — screen readers may announce each non-breaking space, or create awkward pauses
  • Maintenance — spacing in markup is brittle; CSS handles it more cleanly
  • Semantics&nbsp; means "these two things should not be separated", not "add space here"

Use &nbsp; only for its intended purpose: preventing a line break between two specific words or a number and its unit. Use padding, margin, or gap in CSS for visual spacing.