The Complete Guide to JSON Validation
Validating JSON sounds trivial until a missing comma takes down your CI pipeline or a BOM character silently corrupts a webhook payload. The syntax is simple — six types, a handful of rules — but the sharp corners trip up even experienced developers. This guide covers the spec, the ecosystem, and the mistakes that cost you debugging time.
JSON's origin story
Douglas Crockford didn't invent JSON — he discovered it. Around 2001, he formalized the object literal syntax that JavaScript had used since 1997 and gave it a name. The key insight was making it language-independent: any language could parse it, not just JavaScript.
It went through three RFCs: RFC 4627 (2006, informational), RFC 7159 (2014), and RFC 8259 (2017, the current standard). JSON replaced XML as the default interchange format for web APIs because it was smaller, easier to parse, and mapped naturally to data structures in most languages.
The complete JSON spec in 60 seconds
Six types: string, number, object, array, boolean (true/false), and null. That's it. No dates, no binary, no undefined.
The rules that catch people:
- Keys must be double-quoted strings —
{name: "val"}is invalid - No trailing commas —
{"a": 1,}is invalid - No comments —
// nopeand/* nope */will fail parsing - No single quotes —
{'key': 'val'}is invalid - No
undefined,NaN, orInfinity— these aren't JSON values - Numbers can't have leading zeros —
{"n": 07}is invalid - Unicode escapes use
\uXXXX— surrogate pairs for characters above U+FFFF - A valid JSON document can be any of the six types, not just an object or array (since RFC 7159)
JSON vs JSONC vs JSON5 vs YAML vs TOML
| Feature | JSON | JSONC | JSON5 | YAML | TOML |
|---|---|---|---|---|---|
| Comments | ✗ | // /* */ |
// /* */ |
# |
# |
| Trailing commas | ✗ | ✓ | ✓ | N/A | ✗ |
| Unquoted keys | ✗ | ✗ | ✓ | ✓ | ✓ (bare) |
| Multiline strings | ✗ | ✗ | ✓ | ✓ | ✓ (""") |
| Date type | ✗ | ✗ | ✗ | ✓ | ✓ |
| Typical use | APIs | VS Code | Configs | K8s, CI | Rust, Python |
Pick the right tool: JSON for interchange between systems. JSONC/JSON5 for config files humans edit. YAML for Kubernetes and CI pipelines. TOML for Cargo.toml and pyproject.toml.
Beyond syntax: JSON Schema validation
This tool validates syntax — is it parseable JSON? But production systems often need structural validation: does this object have the right fields, with the right types and constraints?
That's what JSON Schema does. A quick example:
{
"type": "object",
"required": ["name", "age"],
"properties": {
"name": { "type": "string", "minLength": 1 },
"age": { "type": "integer", "minimum": 0 }
}
}
This schema rejects {"name": "", "age": -1} — syntactically valid JSON, structurally invalid. The current spec is Draft 2020-12. Popular validators: ajv (JavaScript, fast), jsonschema (Python).
Where validation matters in production
- API contract testing — validate request/response shapes in CI before deployment
- CI pipeline configs — a missing comma in
.github/workflows/*.yml(which often embeds JSON) breaks your pipeline - Webhook payloads — third-party services send JSON with no guarantees; validate before processing
- Database inputs — malformed JSON in a
JSONBcolumn can cause silent data corruption or query failures - Debugging — when a third-party API returns an error, paste the response here to check if the JSON itself is broken
JSON antipatterns
Deep nesting (>3-4 levels) — Usually a sign your data model needs flattening or normalization. Deep JSON is hard to query, hard to diff, and produces terrible error messages.
Binary data as Base64-in-JSON — Works but bloats payload size by 33%. If you're moving files, use multipart uploads or presigned URLs.
Giant monolithic files — A 500 MB JSON array can't be streamed or partially parsed. For large datasets, use JSON Lines (NDJSON) — one JSON object per line, streamable and splittable.
JSON for human-edited config — No comments, no trailing commas, strict quoting. Every edit risks a syntax error. Use JSONC, JSON5, YAML, or TOML for configs that humans touch regularly.
Troubleshooting
"Unexpected token" but the line looks correct — Check for invisible characters: zero-width spaces (U+200B), smart quotes from word processors, or a BOM (U+FEFF) at the start of the file. Paste into a plain text editor and re-inspect.
"Unexpected end of JSON input" — Your JSON is truncated. A closing } or ] is missing somewhere. Count your brackets, or use a formatter that highlights mismatches.
Error points to a line that looks fine — The real mistake is almost always on the previous line. A missing comma after a value causes the parser to choke on the next property name.
Valid JSON but my app still can't parse it — Check the file encoding. If it's UTF-16 and your parser expects UTF-8, parsing will fail or produce garbled output. Also check for a BOM (\uFEFF) — some parsers reject it, others silently skip it.
Duplicate keys don't cause an error — Per RFC 8259, behavior for duplicate keys is undefined. Most parsers silently keep the last value. Don't rely on this — use a linter like jq or jsonlint to catch duplicates.
Numbers lose precision — JSON numbers map to IEEE 754 double-precision floats. Integers above 2^53 (9,007,199,254,740,992) silently lose precision. If you're working with Snowflake IDs, Twitter IDs, or other large integers, transmit them as strings.