csvjson
·12 min read

How to Convert Large JSON Files to CSV (Without Crashing)

Why large JSON files crash most converters — and four approaches that actually work. Covers in-browser streaming, Python ijson for multi-gigabyte files, pandas chunking, and when each method is the right call.

You have a large JSON file. You drag it into an online converter and it freezes. You open it in Python and hit a MemoryError. You try a different tool and get "file too large" at 10MB.

Large JSON files break most converters in one of three ways: the browser runs out of heap memory, the server rejects the upload, or Python loads the entire file into RAM and runs out. None of these are problems with your data — they're problems with the tool.

This guide explains why each failure happens and gives you a working solution for every size range.


Why large JSON files crash converters

The browser memory problem

Most online converters work like this: you upload the file, it goes to a server, the server parses it, the server sends back a CSV. The upload limit is usually somewhere between 5MB and 50MB. Hit that limit and you get an error page.

Browser-based converters (no server) have a different constraint: JavaScript's heap. A 50MB JSON file doesn't consume 50MB of memory — it consumes several times that. The raw string is in memory, the parsed object tree is in memory, the intermediate representation is in memory, and the output CSV string is in memory. All at once. A 50MB JSON file can easily require 400-600MB of working memory in the browser. Most browsers start struggling and throwing exceptions well before that.

The JSON to CSV converter on this site works differently: it parses the JSON incrementally using a streaming approach, writes the output as it processes, and never holds the full parsed tree alongside the full output string at the same time. There's no upload — it runs entirely in your browser, so there's no server limit to hit either.

The Python memory problem

Python's most common approach to JSON is json.load(f) — which loads the entire file into memory as a Python dict or list. For a 1GB JSON file, you might need 3-8GB of RAM just to hold the parsed structure, before you've done anything with it. This is what causes:

MemoryError

or the more specific pandas error:

ValueError: Trailing data
pandas.errors.ParserError: array_item_start_row > array_item_end_row

or the process just silently consuming all available RAM and getting killed by the OS.

The fix is streaming — parsing the JSON token by token instead of all at once.


Approach 1: The online converter (up to ~500MB)

For most large JSON files, the fastest path is the JSON to CSV converter. Paste your JSON or upload the file directly.

What makes it handle large files:

  • No upload limit — the file never leaves your machine
  • Streaming JSON parsing — processes the array incrementally
  • Writes CSV output in chunks — doesn't accumulate the full output string before downloading
  • No server timeout — it runs in your browser tab, not on a server with a 30-second request limit

Practical ceiling: Browser tabs are typically limited to a few gigabytes of memory by the OS and browser. Very large files — above 500MB — may still hit the browser's heap limit depending on the JSON structure and your machine's available RAM. For files over 500MB, use Python.


Approach 2: Python streaming with ijson (any size)

ijson is a Python library that parses JSON as a stream — it reads the file token by token, emitting Python objects as it finds complete ones. Memory usage stays roughly constant no matter how large the file is, because you only hold one record in memory at a time.

Install

pip install ijson

Basic streaming conversion

import ijson
import csv

def json_to_csv_streaming(json_path, csv_path):
    with open(json_path, "rb") as f_in, open(csv_path, "w", newline="", encoding="utf-8") as f_out:
        writer = None
        
        # Parse the top-level array item by item
        for record in ijson.items(f_in, "item"):
            if writer is None:
                # Use the first record to determine headers
                headers = list(record.keys())
                writer = csv.DictWriter(f_out, fieldnames=headers, extrasaction="ignore")
                writer.writeheader()
            writer.writerow(record)

json_to_csv_streaming("data.json", "data.csv")

ijson.items(f, "item") reads the file assuming the root is a JSON array and yields each element one at a time. Your file can be 10GB — only one record is in RAM at a time.

Handling a nested data key

Most API dumps don't have a bare array at the root. They look like {"data": [...], "meta": {...}}. Adjust the prefix:

for record in ijson.items(f_in, "data.item"):
    ...

The prefix "data.item" tells ijson to navigate to the data key and then yield each item in that array.

Flattening nested objects while streaming

If your records have nested fields you want to flatten to dot-notation columns, apply a flattening function before writing:

def flatten(obj, prefix="", sep="."):
    out = {}
    for key, value in obj.items():
        full_key = f"{prefix}{sep}{key}" if prefix else key
        if isinstance(value, dict):
            out.update(flatten(value, full_key, sep))
        elif isinstance(value, list):
            out[full_key] = "|".join(str(v) for v in value)
        else:
            out[full_key] = value
    return out

for record in ijson.items(f_in, "item"):
    flat = flatten(record)
    ...

Arrays of scalars become pipe-delimited strings. Arrays of objects are a harder problem — see the chunking section below for how pandas handles those.

Dynamic headers across the stream

If your JSON has sparse fields — different records have different keys — you can't determine the full header set from the first record alone. Two options:

Option A: Two-pass streaming — stream once to collect all keys, stream again to write rows. Works well if reading the file twice is acceptable (local file, not a pipe):

import ijson
import csv

def collect_headers(json_path):
    headers = set()
    with open(json_path, "rb") as f:
        for record in ijson.items(f, "item"):
            headers.update(record.keys())
    return sorted(headers)

def stream_to_csv(json_path, csv_path):
    headers = collect_headers(json_path)
    with open(json_path, "rb") as f_in, open(csv_path, "w", newline="", encoding="utf-8") as f_out:
        writer = csv.DictWriter(f_out, fieldnames=headers, extrasaction="ignore")
        writer.writeheader()
        for record in ijson.items(f_in, "item"):
            writer.writerow(record)

stream_to_csv("sparse_data.json", "output.csv")

Option B: Buffer and flush — hold N records, discover headers as you go, write in batches. More memory than pure streaming but handles one-pass requirements:

import ijson
import csv

BUFFER_SIZE = 5000

def stream_with_buffer(json_path, csv_path):
    headers_seen = set()
    buffer = []
    writer = None
    
    with open(json_path, "rb") as f_in, open(csv_path, "w", newline="", encoding="utf-8") as f_out:
        for record in ijson.items(f_in, "item"):
            buffer.append(record)
            headers_seen.update(record.keys())
            
            if len(buffer) >= BUFFER_SIZE:
                if writer is None:
                    writer = csv.DictWriter(f_out, fieldnames=sorted(headers_seen), extrasaction="ignore")
                    writer.writeheader()
                writer.writerows(buffer)
                buffer.clear()
        
        # Flush remaining records
        if buffer:
            if writer is None:
                writer = csv.DictWriter(f_out, fieldnames=sorted(headers_seen), extrasaction="ignore")
                writer.writeheader()
            writer.writerows(buffer)

stream_with_buffer("sparse.json", "output.csv")

Approach 3: pandas with chunking (50MB–2GB)

If your JSON is in NDJSON format (newline-delimited JSON — one JSON object per line), pandas can read it in chunks natively:

import pandas as pd

chunk_size = 10_000  # rows per chunk
first_chunk = True

with pd.read_json("data.ndjson", lines=True, chunksize=chunk_size) as reader:
    for chunk in reader:
        chunk.to_csv(
            "output.csv",
            mode="w" if first_chunk else "a",
            header=first_chunk,
            index=False,
        )
        first_chunk = False

print("Done")

lines=True enables NDJSON mode. chunksize returns a JsonReader iterator instead of loading everything at once.

Is your file NDJSON? It looks like this — one complete JSON object per line, no outer array brackets:

{"id": 1, "name": "Alice", "score": 95}
{"id": 2, "name": "Bob", "score": 87}
{"id": 3, "name": "Sara", "score": 91}

Standard JSON (an array) looks like this:

[
  {"id": 1, "name": "Alice", "score": 95},
  {"id": 2, "name": "Bob", "score": 87}
]

pandas chunksize only works with NDJSON. For standard JSON arrays, use ijson.

Converting standard JSON to NDJSON first

If you have a standard JSON array and want to use pandas chunking, convert it to NDJSON first with a streaming pass:

import ijson
import json

with open("data.json", "rb") as f_in, open("data.ndjson", "w") as f_out:
    for record in ijson.items(f_in, "item"):
        f_out.write(json.dumps(record) + "\n")

# Now chunk with pandas
import pandas as pd

first = True
with pd.read_json("data.ndjson", lines=True, chunksize=10_000) as reader:
    for chunk in reader:
        chunk.to_csv("output.csv", mode="w" if first else "a", header=first, index=False)
        first = False

Approach 4: Split the file first, then convert

Sometimes the simplest approach wins. If you have a 2GB JSON array and you just need it split into manageable pieces:

import ijson
import json

def split_json(json_path, output_prefix, records_per_file=100_000):
    file_num = 0
    buffer = []
    
    with open(json_path, "rb") as f:
        for record in ijson.items(f, "item"):
            buffer.append(record)
            if len(buffer) >= records_per_file:
                out_path = f"{output_prefix}_{file_num:04d}.json"
                with open(out_path, "w") as f_out:
                    json.dump(buffer, f_out)
                print(f"Wrote {out_path} ({len(buffer)} records)")
                buffer.clear()
                file_num += 1
    
    if buffer:
        out_path = f"{output_prefix}_{file_num:04d}.json"
        with open(out_path, "w") as f_out:
            json.dump(buffer, f_out)
        print(f"Wrote {out_path} ({len(buffer)} records)")

split_json("huge_export.json", "chunk", records_per_file=100_000)

Each chunk file is now small enough to import into Excel directly, or convert with the online converter, or process with standard pandas read_json.


Which approach for which file size?

| File size | Situation | Best approach | |-----------|-----------|---------------| | Under 50MB | Any format | Online converter — fastest, no setup | | 50MB–500MB | Standard JSON array | Online converter (streaming mode) | | 50MB–2GB | NDJSON format | pandas chunksize | | Any size | Nested objects need flattening | ijson + custom flatten function | | Over 500MB | Standard JSON array | ijson streaming | | Over 2GB | Any format | ijson streaming (the only option that stays memory-constant) | | Need to process in Excel | Large file | Split with ijson → multiple CSV/XLSX files |


Common errors and what they mean

MemoryError in Python

You hit RAM. The file is being fully loaded into memory. Switch to ijson streaming — it reads one record at a time and memory usage stays constant.

json.decoder.JSONDecodeError: Extra data

This usually means the file is NDJSON (multiple JSON objects, one per line) but you used json.load() which expects exactly one JSON value. Use lines=True in pandas or iterate line by line with json.loads().

ValueError: Trailing data in pandas

Same as above — NDJSON read without lines=True. Add lines=True to pd.read_json().

Browser tab crashes or goes gray

The JSON file exceeded your browser's heap limit. The in-browser converter handles this better than most tools by streaming, but for files above a few hundred megabytes, switch to Python.

ijson.common.IncompleteJSONError

The JSON file is truncated or corrupt. The download was interrupted, or the export process didn't finish. Verify the file ends correctly — a valid JSON array ends with ].

Conversion finishes but output CSV is half the rows

The JSON may have records after the point where streaming stopped — usually because the prefix path is wrong. If the JSON is {"results": {"items": [...]}}, the prefix should be results.items.item, not item. Print the first few keys ijson finds: list(ijson.items(open(f, "rb"), "", use_float=True))[:3] to inspect the structure.


Performance comparison

For a 500MB JSON file with 1 million records:

| Method | Memory peak | Time | Notes | |--------|-------------|------|-------| | json.load() + standard csv | ~3.5GB | ~45s | Crashes on machines with <4GB free RAM | | ijson streaming | ~35MB | ~90s | Constant memory regardless of file size | | pandas read_json (no chunk) | ~4GB | ~40s | Crashes same as json.load | | pandas NDJSON chunksize=10k | ~120MB | ~60s | Requires NDJSON format | | Online converter | ~600MB browser heap | ~30s | Fastest for files under 500MB |

ijson is slower because it parses character by character rather than using C-optimized bulk parsing. For files that fit in memory, json.load() is faster. For files that don't, ijson is the only option.


Frequently Asked Questions

How large a JSON file can the online converter handle?

The JSON to CSV converter runs entirely in your browser — no upload, no server limit. In practice, files up to 200-300MB convert reliably on modern machines with 8GB+ RAM. Above 500MB, results depend on your available browser memory. For very large files, Python with ijson is the reliable choice.

Why does my 20MB JSON file crash the converter but 20MB CSV files load fine?

JSON is much more expensive to parse than CSV. A 20MB JSON array of objects requires building an in-memory object tree with keys for every field of every record. A 20MB CSV is read as plain lines. The parsed JSON tree can be 10-20x larger in memory than the raw file size. Some converters also accumulate the entire output string before offering a download, doubling the memory again.

Can I stream JSON directly from a URL without downloading it?

Yes. In Python, ijson works with any file-like object, including HTTP streams:

import ijson
import csv
import urllib.request

url = "https://example.com/large-export.json"

with urllib.request.urlopen(url) as response:
    with open("output.csv", "w", newline="") as f_out:
        writer = None
        for record in ijson.items(response, "item"):
            if writer is None:
                writer = csv.DictWriter(f_out, fieldnames=list(record.keys()))
                writer.writeheader()
            writer.writerow(record)

This streams the response directly to CSV without ever writing the JSON file to disk.

My JSON file has arrays inside objects — how do I handle those while streaming?

Arrays of scalars can be joined to a string: "|".join(str(v) for v in value). Arrays of objects (nested records) are harder — they expand one record into multiple rows, which breaks the simple "one JSON object = one CSV row" assumption. The cleanest approach is to normalize before writing:

for record in ijson.items(f_in, "item"):
    nested_items = record.pop("line_items", [])
    for item in nested_items:
        row = {**record, **{f"item_{k}": v for k, v in item.items()}}
        writer.writerow(row)

This produces one row per nested item, with the parent fields repeated on each row — the same structure the online converter uses when it expands nested arrays.

What's the largest JSON file anyone has converted with this tool?

The online converter has no enforced file size limit. The practical ceiling is available browser memory. Users have converted files in the 400-500MB range successfully. For anything larger, ijson is the recommendation — it has no ceiling.

Does ijson work with all JSON structures?

ijson works with any valid JSON file. The key is specifying the right prefix for ijson.items(). For a bare array [...], use "item". For {"data": [...]}, use "data.item". For {"response": {"results": [...]}}, use "response.results.item". If you're unsure, you can print the first few events ijson emits: list(ijson.parse(open(f, "rb")))[:20] to see the structure.