ASCII / Unicode Explorer: Inspect Hidden Characters, UTF-8 Bytes, and the Full ASCII Table
ASCII and Unicode get mixed together constantly in debugging conversations, but they solve different problems. ASCII is a fixed 128-value character set from 0 to 127 (0x00-0x7F). Unicode is the much larger standard that assigns code points across modern scripts, symbols, emoji, controls, and formatting marks. UTF-8 and UTF-16 are encodings and storage forms for those Unicode code points, not separate character sets.
This tool focuses on inspection rather than lossy conversion. Paste text with hidden characters, enter a code point like U+200B or 0x41, and inspect the result locally in the browser. Toolzy breaks text down by Unicode code point, shows UTF-8 bytes and UTF-16 code units, and keeps a canonical 128-row ASCII table on the same page for quick reference.
What this tool helps you debug
- Zero-width spaces, non-breaking spaces, word joiners, and bidi marks that make strings look normal but behave strangely
- Emoji and supplementary-plane characters that take two UTF-16 code units even though they appear as one visible symbol
- Combining-mark sequences like
e+U+0301, where one grapheme cluster spans multiple code points - Control characters such as
TAB,LF,CR, andDELthat should never be rendered as blank output - Differences between
string.length, Unicode code points, and grapheme clusters
ASCII vs Unicode vs UTF-8 vs UTF-16
These terms are related but not interchangeable:
ASCIIis the 7-bit subsetU+0000-U+007FUnicodeassigns abstract code points likeU+0041orU+1F600UTF-8encodes Unicode as one to four bytesUTF-16stores Unicode as one or two 16-bit code units
That distinction matters in JavaScript because strings are UTF-16 sequences. A character like A is one code point and one UTF-16 code unit. A character like 😀 is one code point but two UTF-16 code units. A visible symbol like é may be two code points but one grapheme cluster.
Why hidden characters break real workflows
Many production bugs come from characters that are technically valid but visually hard to spot. A URL copied from chat may contain U+200B ZERO WIDTH SPACE. A CMS export may include U+00A0 NO-BREAK SPACE instead of a normal space. Logs may contain smart quotes, directional isolates, or replacement characters.
Those values can affect parsing, equality checks, sorting, wrapping, tokenization, and test fixtures. The point of an explorer is to show the raw reality of the string without silently normalizing or cleaning it up.
Surrogate pairs, combining marks, and string.length
If string.length feels wrong, it's usually measuring UTF-16 code units instead of the visible units you care about.
😀.length is2because the emoji is stored as a surrogate pair in UTF-16é.length can also be2when the accent is a separate combining mark- A flag emoji or ZWJ emoji sequence may contain several code points while rendering as one visible symbol
That is why this tool reports UTF-16 code units, Unicode code points, and grapheme clusters separately when the browser supports Intl.Segmenter.
Using the ASCII table correctly
The ASCII table on this page is intentionally strict: exactly 128 rows, exactly 0x00-0x7F. Values from 0x80 to 0xFF are not ASCII, even if older docs call them "extended ASCII". Those are non-ASCII 8-bit code pages or Unicode values, depending on context.
If you need to answer questions like "what is ASCII 65" or "what is hex 0x09", the table covers that directly. If you need to answer "what hidden Unicode character is in this payload", use the inspector above it.
Troubleshooting
Why does an emoji count as 2 in JavaScript? — JavaScript string length counts UTF-16 code units. Many emoji are one Unicode code point represented by a surrogate pair, so they take two UTF-16 code units.
Why do two strings look identical but compare differently? — They may use different Unicode sequences, such as a precomposed character versus a base letter plus combining mark, or a normal space versus a non-breaking space.
Why is UTF-8 unavailable for some pasted text? — The text may contain ill-formed UTF-16 such as a lone surrogate. JavaScript can hold that raw data, but valid UTF-8 requires Unicode scalar values.
Why is a code point valid even if it renders as a box or tofu? — Unicode validity and device font support are different things. A valid code point can still lack a glyph on the current system.
Why doesn't Toolzy show an official name for every assigned character yet? — This v1 ships with a pinned local name subset for ASCII and common debugging characters so the tool stays lightweight. Status detection still works for all code points without runtime fetches.