The Complete Guide to URL Encoding

Percent-encoding is formally defined in RFC 3986 (2005), which superseded the original URL spec in RFC 1738. The RFC defines the grammar for URIs, the exact set of unreserved characters, and when percent-encoding is required versus optional. RFC 3987 extends this to Internationalized Resource Identifiers (IRIs), which allow Unicode directly.


URL encoding in different contexts

The rules shift depending on where in the URL you're encoding:

Each context has slightly different reserved sets. A + in a query string might mean a space (form-encoded) or a literal plus (RFC 3986), depending on whether the server expects form encoding.


Unicode in URLs

Non-ASCII characters — emoji, accented letters, CJK characters — are first converted to their UTF-8 byte representation, then each byte is percent-encoded individually:

café → caf%C3%A9
  é = U+00E9
  UTF-8 bytes: 0xC3 0xA9
  Encoded: %C3%A9

IRIs (RFC 3987) allow Unicode directly in identifiers, and modern browsers display decoded Unicode in the address bar. But under the hood, the HTTP request uses percent-encoded UTF-8 bytes. If you're building URLs programmatically with non-ASCII data, always encode through encodeURIComponent() — it handles the UTF-8 conversion automatically.


Troubleshooting

Non-ASCII characters show as garbage after decoding — Encoding mismatch. The URL was percent-encoded from UTF-8 bytes, but the decoder is interpreting them as Latin-1 or another single-byte encoding. Make sure both sides agree on UTF-8. In JavaScript, decodeURIComponent() always expects UTF-8, which is almost always correct.

API returns 400 when parameter value contains special characters — The value likely isn't being encoded before being placed in the URL. Wrap it with encodeURIComponent(). Common culprits: email addresses (contains @ and .), file paths (contain /), and search queries (contain spaces and punctuation).