HTML Entity Encoder / Decoder Guide

What is an HTML entity encoder/decoder?

An HTML entity encoder converts characters into HTML character references so the browser renders them as text instead of interpreting them as markup. The classic examples are < to &lt;, > to &gt;, and & to &amp;. An HTML entity decoder does the reverse and turns those references back into the underlying characters.

This matters anywhere text and markup meet. If you want to display <button> as text in a tutorial, an HTML entity encoder is the right tool. If you pulled Fish &amp; Chips from a CMS or an API response and need the readable version, an HTML entity decoder is the right tool. Toolzy's converter runs in the browser, so your content stays local while you encode HTML entities or decode HTML entities.

How to use this tool

  1. Paste the text, markup, or entity-encoded content into the input.
  2. Choose whether you want to encode or decode.
  3. Review the output for the context you're targeting.
  4. Copy the result into your template, docs, or debugging workflow.

Quick check: if the browser is mistakenly treating your text as HTML, you probably need encoding. If your content is full of &lt;, &amp;, and &#160;, you probably need decoding.

Common use cases

Named vs numeric entities

HTML supports both named entities and numeric character references.

Named entities are easier for humans to read, especially common ones like &amp; and &nbsp;. Numeric references are useful when you want an explicit code point or when the character has no memorable entity name. For most developer workflows, both decode to the same character value.

One caveat: HTML does not define a named entity for every Unicode character developers might care about. Numeric references are the fallback when you need exact representation for a specific code point.

HTML text vs attribute contexts

Encoding rules depend on where the content lands.

In a text node, the characters that usually matter are < and &. For example, if you want to display the literal text <em>hello</em>, encoding to &lt;em&gt;hello&lt;/em&gt; is enough.

In an attribute value, quotes matter too. If you render title="Tom & Jerry" from dynamic input, the & should be encoded and the quote may need &quot; depending on which quote character wraps the attribute. Context still matters even if you have an HTML entity encoder in front of you.

That is why framework auto-escaping is usually the safer default. Manual conversion is best for debugging, content prep, generated docs, or cases where you explicitly control the output target.

&nbsp;, Unicode, and invisible differences

&nbsp; decodes to a non-breaking space, which is Unicode U+00A0. It is not the same character as a regular space (U+0020), even though many editors render them identically.

This difference matters in layout and comparisons:

The same principle applies to other Unicode characters. Curly quotes, em dashes, emoji, and accented characters may be represented directly or via numeric entities like &#x2014; or &#233;. After decoding, you have characters. After re-encoding, you may get a different but equivalent entity form.

Decoding vs sanitization

Developers mix these up all the time, and they solve different problems.

Decoding entities converts representation. Sanitization enforces safety rules. If you decode &lt;img src=x onerror=alert(1)&gt;, you now have a literal <img> tag with an event handler. That may be useful for inspection, but it is not safe to inject into the DOM as trusted HTML.

Use an HTML entity decoder when you need to read or transform content. Use a sanitizer when you need to allow only safe markup. Use proper escaping when you need to output plain text into HTML. Those are separate steps.

Round-trip caveats

Round-tripping through encode -> decode or decode -> encode preserves the character data, but not necessarily the original spelling of the entities.

Examples:

This matters if you're diffing generated HTML, preserving authoring style, or trying to keep a legacy file byte-for-byte identical. An HTML entity encoder/decoder is usually value-preserving, not format-preserving.

Practical examples

Troubleshooting

Why did my decoded output create actual HTML tags? — Because decoding turns entity references back into literal characters. If those characters form markup like <strong> or <script>, the result is just HTML again.

Why is &nbsp; still affecting layout after decoding? — It decodes to a non-breaking space character, not a regular space. The browser still treats it as non-breaking whitespace.

Why doesn't the output match the exact entity style I started with? — Round-tripping usually normalizes equivalent forms. &lt;, &#60;, and &#x3C; all represent <, and a later encode step may choose a different valid representation.

Why did my converted output clear after I edited the input? — Editing the input resets the previous result so swap and copy actions stay tied to the current text. Convert again to regenerate output for the new content.

Does decoding entities sanitize untrusted HTML? — No. Decoding is not sanitization. It can expose markup that becomes dangerous if inserted into the DOM without proper sanitizing or escaping.

Should I encode every Unicode character? — Usually no. Modern HTML handles Unicode directly. Encode characters when they have HTML syntax meaning, when the target system requires entity form, or when you want explicit readability for a specific code point.