The Complete Guide to URL Encoding
Percent-encoding is formally defined in RFC 3986 (2005), which superseded the original URL spec in RFC 1738. The RFC defines the grammar for URIs, the exact set of unreserved characters, and when percent-encoding is required versus optional. RFC 3987 extends this to Internationalized Resource Identifiers (IRIs), which allow Unicode directly.
URL encoding in different contexts
The rules shift depending on where in the URL you're encoding:
- Query strings:
?name=hello%20world— spaces can be%20or+ - Path segments:
/users/John%20Doe— spaces must be%20,+is a literal plus - Form data:
application/x-www-form-urlencodeduses+for spaces and encodes most special characters - Fragment identifiers:
#section%20two— encoded but never sent to the server
Each context has slightly different reserved sets. A + in a query string might mean a space (form-encoded) or a literal plus (RFC 3986), depending on whether the server expects form encoding.
Unicode in URLs
Non-ASCII characters — emoji, accented letters, CJK characters — are first converted to their UTF-8 byte representation, then each byte is percent-encoded individually:
café → caf%C3%A9
é = U+00E9
UTF-8 bytes: 0xC3 0xA9
Encoded: %C3%A9
IRIs (RFC 3987) allow Unicode directly in identifiers, and modern browsers display decoded Unicode in the address bar. But under the hood, the HTTP request uses percent-encoded UTF-8 bytes. If you're building URLs programmatically with non-ASCII data, always encode through encodeURIComponent() — it handles the UTF-8 conversion automatically.
Troubleshooting
Non-ASCII characters show as garbage after decoding — Encoding mismatch. The URL was percent-encoded from UTF-8 bytes, but the decoder is interpreting them as Latin-1 or another single-byte encoding. Make sure both sides agree on UTF-8. In JavaScript, decodeURIComponent() always expects UTF-8, which is almost always correct.
API returns 400 when parameter value contains special characters — The value likely isn't being encoded before being placed in the URL. Wrap it with encodeURIComponent(). Common culprits: email addresses (contains @ and .), file paths (contain /), and search queries (contain spaces and punctuation).