The Complete Guide to CSV and JSON Conversion
CSV and JSON are two of the most common data formats in a developer's daily workflow. Converting between them sounds trivial — until you actually try to handle it correctly. This guide covers the sharp edges, the lossy tradeoffs, and the tools that save you from writing your own parser.
CSV quirks
The edge cases around quoting, delimiters, and escaping are covered on the tool page. The key thing to remember: CSV has zero type information and no standard encoding, so your parser will have to guess — and RFC 4180 only gets you so far.
When to convert CSV to JSON
Most of the time, this conversion happens because you have tabular data and need it in a structured format:
- Feeding spreadsheet exports to APIs — your product team hands you a
.csvand the endpoint expects JSON - Loading data into NoSQL databases — MongoDB, DynamoDB, and Firestore all want documents, not rows
- Transforming tabular data for frontend display — JavaScript works naturally with objects and arrays, not delimited strings
- Scripting data pipelines —
jqcan filter and transform JSON; there's no equivalent CSV toolchain with that power
JSON to CSV: the lossy direction
Going from JSON to CSV is inherently lossy. JSON supports nesting, arrays, mixed types, and null. CSV is a flat grid of strings.
Nested objects have to be flattened. The most common approach is dot notation — address.city becomes a column header. But there's no standard for this, and deeply nested structures produce absurd column names like orders.0.items.2.price.
Arrays are worse. If a user has three email addresses and another has one, your CSV either wastes columns (email_1, email_2, email_3) or serializes the array as a JSON string inside a CSV field — which defeats the purpose.
{ "name": "Ada", "tags": ["admin", "dev"], "address": { "city": "London" } }
Flattened to CSV:
name,tags,address.city
Ada,"[""admin"",""dev""]",London
Type information is gone. The number 42 and the boolean true become the strings "42" and "true". If you round-trip JSON → CSV → JSON, you won't get back what you started with.
CSV alternatives worth knowing
CSV persists because it's universal, but better options exist for specific use cases:
- TSV (tab-separated) — avoids the comma-quoting mess since tabs rarely appear in data. Same limitations otherwise.
- JSON Lines / NDJSON — one JSON object per line. Streamable, splittable, retains types. Best of both worlds for data pipelines.
- Parquet — columnar binary format for big data. Typed, compressed, fast to query. Standard in the Spark/Arrow ecosystem.
- Excel / XLSX — rich formatting and multiple sheets, but requires libraries to read (
xlsx,openpyxl). Not a text format.
If you control both the producer and consumer, JSON Lines is almost always a better choice than CSV.
Handling CSV in code
The single most important rule: don't write your own CSV parser. The edge cases — quoted fields, escaped quotes, newlines in values, mixed encodings — will destroy you. line.split(",") is not a CSV parser.
JavaScript — Papa Parse:
import Papa from "papaparse";
const result = Papa.parse(csvString, { header: true, dynamicTyping: true });
console.log(result.data); // array of objects
dynamicTyping: true attempts to convert numbers and booleans — useful, but verify the results with your actual data.
Python — built-in csv or pandas:
import pandas as pd
df = pd.read_csv("data.csv")
df.to_json("data.json", orient="records", indent=2)
For quick conversions, pandas is hard to beat. For streaming large files without loading everything into memory, use the csv module with a generator.
Troubleshooting
Columns are misaligned or there are too many columns — A field contains an unquoted comma. The parser splits it into two columns, and every subsequent column shifts right. Fix the source data by quoting fields that contain commas, or use a parser that handles RFC 4180 correctly.
Special characters appear as garbled text (é, â€") — Encoding mismatch. The file is probably UTF-8 but your tool is reading it as Latin-1, or vice versa. In Papa Parse, set encoding: "UTF-8". In Python, use open("file.csv", encoding="utf-8"). When in doubt, check the file with file -I data.csv on macOS/Linux.
First row is treated as data instead of headers — Most parsers don't auto-detect header rows. Set header: true in Papa Parse or header=0 in pandas. If the file genuinely has no headers, you'll need to provide column names manually.
Large CSV files freeze the browser or crash the tab — CSV parsing is CPU-intensive and browsers have memory limits. For files over ~50 MB, process them server-side or use a streaming parser. Papa Parse supports streaming with step callbacks. In Python, use chunksize in pd.read_csv() to process in batches.
Numbers like 0012345 lose their leading zeros — Parsers with dynamic typing interpret these as integers. If leading zeros matter (zip codes, product codes), disable dynamic typing or explicitly set those columns to string type.