The Complete Guide to Base64 Encoding
Base64 is one of those things every developer uses but few stop to think about. This guide covers where it came from, how it actually works, and the sharp edges you'll hit in production.
The history: why Base64 exists
In the early days of email, SMTP gateways assumed messages were 7-bit ASCII. Binary attachments — images, PDFs, executables — would get mangled or silently dropped by any hop that stripped the high bit. The MIME standard (RFC 2045, 1996) solved this by defining content-transfer encodings that could represent arbitrary bytes using only printable ASCII characters.
Base64 was the workhorse encoding. RFC 4648 (2006) later formalized it as a standalone spec, decoupled from email. The motivation was never security — it was interoperability. Make binary survive text-only pipes.
How the algorithm works
Base64 takes every 3 input bytes (24 bits), splits them into four 6-bit groups, and maps each group to one of 64 ASCII characters: A-Z, a-z, 0-9, +, /. When the input length isn't divisible by 3, the output is padded with =.
Worked example: encoding "Hi"
Input bytes: H = 0x48 (01001000) i = 0x69 (01101001)
Combined bits: 01001000 01101001
Pad to 24: 01001000 01101001 00000000
6-bit groups: 010010 000110 100100 000000
Indexes: 18 6 36 (pad)
Characters: S G k =
Result: SGk=
Two input bytes → three Base64 characters plus one = pad.
Base64 vs Base64URL vs Base32 vs Hex
| Encoding | Alphabet | Ratio | Use case |
|---|---|---|---|
| Base64 | A-Z, a-z, 0-9, +, / | 4:3 | MIME, data URIs, general binary |
| Base64URL | A-Z, a-z, 0-9, -, _ (no =) | 4:3 | JWTs, URL params, filenames |
| Base32 | A-Z, 2-7 | 8:5 | Human-readable (TOTP codes) |
| Hex | 0-9, a-f | 2:1 | Hashes, debugging, simplicity |
Base64URL swaps + and / for - and _ and typically drops padding. This is what JWTs use — if you try to decode a JWT segment with a standard Base64 decoder, you'll get errors or garbage.
Base64 in production
Base64 shows up everywhere once you start looking:
- JWTs — all three segments (header, payload, signature) are Base64URL-encoded
- Data URIs —
data:image/png;base64,iVBOR...embeds assets inline in CSS/HTML - HTTP Basic Auth —
Authorization: Basic dXNlcjpwYXNz(that'suser:pass) - SMTP/MIME — the original use case; email attachments are still Base64
- Kubernetes "secrets" — stored as Base64 in manifests (this is encoding, not encryption)
- AWS request signing — signature bytes get Base64-encoded in the
Authorizationheader
The 33% overhead problem
Three input bytes become four output characters — a 33% size increase. A 1 MB image becomes ~1.33 MB as Base64. Worse: Base64 output looks like high-entropy random text, so gzip and brotli can barely compress it (unlike the original binary, which often has repetitive structure).
Rule of thumb: inline assets under ~5 KB as data URIs. Above that, serve the binary file with proper caching headers. A 200 KB PNG inlined as Base64 in your CSS will bloat the stylesheet, defeat caching, and slow down first paint.
Base64 is NOT encryption
This cannot be overstated. Base64 uses no key. There is no secret. Anyone with a terminal can reverse it:
echo "cGFzc3dvcmQ=" | base64 -d
# → password
Kubernetes secrets are just Base64-encoded. If someone can kubectl get secret -o yaml, they can read every "secret" in the manifest. For real secret management, use encrypted stores (Vault, AWS Secrets Manager, SOPS) or encrypted etcd at rest.
Browser and Node.js APIs
The browser gives you btoa() and atob(), but they only handle Latin-1 (code points 0–255). Emoji, CJK characters, or anything above U+00FF will throw InvalidCharacterError.
Workaround for UTF-8 in browsers:
// Encode
const encoded = btoa(unescape(encodeURIComponent(str)));
// Decode
const decoded = decodeURIComponent(escape(atob(encoded)));
Or the modern approach with TextEncoder:
const bytes = new TextEncoder().encode(str);
const binary = String.fromCodePoint(...bytes);
const encoded = btoa(binary);
Node.js makes this cleaner:
Buffer.from(str, 'utf8').toString('base64'); // encode
Buffer.from(b64, 'base64').toString('utf8'); // decode
Troubleshooting
JWT middle segment won't decode — It's Base64URL, not standard Base64. Replace - with +, _ with /, pad the string to a multiple of 4 with =, then decode.
btoa() throws "InvalidCharacterError" — Your input contains characters outside Latin-1 (code points > 255). Use the TextEncoder approach or the encodeURIComponent workaround shown above.
Decoded output is garbled text — Encoding mismatch. The data was likely encoded as UTF-8 but you're decoding as Latin-1, or vice versa. Check the source system's encoding and match it on decode.
Base64 string has line breaks every 76 characters — That's valid MIME Base64 (RFC 2045 mandates line wrapping). Most decoders handle it fine. If yours doesn't, strip all whitespace before decoding.
Decoded file is corrupt — Usually a trailing newline or whitespace character got copied with the Base64 string. Trim the input and retry. Also verify nothing truncated the string — partial Base64 decodes to partial (and broken) binary.
Huge HTML from inlined Base64 images — Kills gzip efficiency and cache granularity. A stylesheet change forces re-downloading all those inlined images. Keep data URIs under 5 KB; serve larger assets as separate files with cache headers.