YAML vs JSON vs CSV: When to Use Each Format

Three formats, three different jobs. YAML is for configuration. JSON is for APIs and data exchange. CSV is for tabular data and spreadsheets. That's the short answer — but the boundaries blur constantly in practice, and picking the wrong format creates real problems: type errors, unreadable config files, bloated payloads, tools that refuse to import your data.

This guide explains what each format is optimized for, where each breaks down, and how to decide which one to use when the choice isn't obvious.

Quick decision table

| You need to... | Use | |----------------|-----| | Configure an application (Docker, Kubernetes, CI/CD) | YAML | | Send data between a server and a browser | JSON | | Call or build a REST API | JSON | | Store hierarchical data with types | JSON | | Export data to Excel or Google Sheets | CSV | | Feed data into pandas, R, or SQL | CSV | | Share data with non-technical users | CSV | | Write human-editable config with comments | YAML | | Represent a simple flat list of records | CSV or JSON (both work) | | Represent nested or mixed-type data | JSON or YAML |

What each format looks like

The same dataset in all three formats:

CSV:

id,name,role,active
1,Alice Chen,Engineering Lead,true
2,Bob Kumar,Designer,true
3,Sara Mills,VP Engineering,false

JSON:

[
  {"id": 1, "name": "Alice Chen", "role": "Engineering Lead", "active": true},
  {"id": 2, "name": "Bob Kumar", "role": "Designer", "active": true},
  {"id": 3, "name": "Sara Mills", "role": "VP Engineering", "active": false}
]

YAML:

- id: 1
  name: Alice Chen
  role: Engineering Lead
  active: true
- id: 2
  name: Bob Kumar
  role: Designer
  active: true
- id: 3
  name: Sara Mills
  role: VP Engineering
  active: false

All three represent the same three records. The differences become significant when the data gets more complex, when humans need to edit it, or when tools need to process it at scale.

CSV: the format for tabular data

CSV (Comma-Separated Values) is a plain-text format for two-dimensional data — rows and columns. It has been around since the 1970s and is supported by virtually every tool that handles data: Excel, Google Sheets, pandas, R, every SQL database, every BI tool.

What CSV is good at

Spreadsheet compatibility. CSV is the universal import/export format for spreadsheet applications. No other format comes close. If a non-technical person needs to open your data in Excel, CSV is the answer.

Data pipeline input. pandas read_csv, R's read.csv, SQL LOAD DATA INFILE, Apache Spark's CSV reader — all are optimized for CSV. The format is simple enough that parsers are extremely fast.

File size. For tabular data, CSV is compact. The same 1000-row dataset is significantly smaller as CSV than as JSON (which repeats every key on every row) or YAML.

Simplicity. You can write, read, and edit CSV in any text editor. No special tools required.

What CSV is bad at

Types. CSV has no type system. Every value is a string. 95000 and "95000" are indistinguishable. Parsers infer types from content, which means NO (valid two-letter country code for Norway) gets read as false in some tools, and 2024-01-01 might become a date or stay a string depending on the parser. See the CSV vs Excel comparison for more on type gotchas.

Hierarchical data. CSV is flat. You can represent nested data by adding columns (address.city, address.state) or by repeating parent fields on each child row — but neither is clean. JSON is the right tool for nested data.

Schema. CSV has no way to declare types, required fields, or validation rules. Every consumer has to infer or hardcode the schema.

Encoding. CSV has no defined encoding. Excel defaults to Windows-1252 on Windows, which breaks any non-ASCII character when the file was written as UTF-8. Always write CSV with a UTF-8 BOM if Excel will open it. See the encoding guide for the full picture.

CSV in practice

import pandas as pd

# Reading
df = pd.read_csv("employees.csv")

# Writing — utf-8-sig adds BOM for Excel compatibility
df.to_csv("output.csv", index=False, encoding="utf-8-sig")

Convert between CSV and JSON with the CSV to JSON converter or JSON to CSV converter. Convert CSV to Excel with the CSV to Excel converter.

JSON: the format for APIs and data exchange

JSON (JavaScript Object Notation) is a text format for structured data. It was designed as a lightweight alternative to XML for browser-server communication and has become the dominant format for REST APIs, configuration with type information, and data interchange between systems.

What JSON is good at

Type information. JSON distinguishes strings, numbers, booleans, null, arrays, and objects. 95000 is a number, "95000" is a string, true is a boolean — not ambiguous. This eliminates an entire category of bugs when passing data between systems.

Hierarchical data. JSON handles arbitrary nesting naturally. A user object can contain an address object which contains a coordinates array. No flattening required.

API standard. REST APIs overwhelmingly use JSON. Every language has a fast, well-tested JSON parser in its standard library. JSON is the lingua franca of web services.

Language support. JSON.parse() in JavaScript, json.loads() in Python, JSON.parse() in Go, Gson in Java — parsing JSON is a solved problem in every mainstream language.

Compact for nested data. For records that have nested objects, JSON is more compact than equivalent YAML. For flat tabular data, CSV is more compact than JSON.

What JSON is bad at

Comments. JSON has no comment syntax. You cannot annotate a JSON config file with explanations. This is the main reason YAML is preferred for human-maintained configuration files.

Readability at scale. A 10,000-line JSON file with deeply nested objects is difficult to read and edit by hand. The required closing braces accumulate at the bottom and it's easy to lose your place.

Verbosity for tabular data. JSON repeats every key on every row. A 100,000-row dataset as JSON is significantly larger than as CSV, and slower to parse, because the parser handles key-value pairs instead of positional columns.

Trailing commas. JSON does not allow trailing commas. The last item in an array or object cannot have a comma after it. This is the single most common JSON syntax error when humans edit JSON files by hand.

{
  "name": "Alice",
  "role": "Engineer",  ← trailing comma — invalid JSON
}

JSON in practice

import json

# Reading
with open("data.json") as f:
    data = json.load(f)

# Writing
with open("output.json", "w") as f:
    json.dump(data, f, indent=2, ensure_ascii=False)

Use the JSON formatter to validate and pretty-print JSON. Convert to other formats with JSON to CSV, JSON to Excel, or JSON to YAML.

YAML: the format for human-edited configuration

YAML (YAML Ain't Markup Language) is a superset of JSON with additional syntax designed for human readability. Every valid JSON document is also valid YAML. YAML adds comments, multi-line strings, anchors, aliases, and a less ceremonious syntax.

What YAML is good at

Comments. YAML supports comments with #. This is why it dominates application configuration — you can explain why a setting exists, not just what it is:

server:
  port: 8080
  # Increase timeout for slow upstream services
  timeout_ms: 30000
  workers: 4

Human writability. YAML doesn't require quotes around most strings, uses indentation instead of braces, and reads like structured prose. For config files that developers edit daily, this reduces friction significantly.

Multi-line strings. YAML has clean syntax for literal blocks and folded blocks:

description: |
  This is a multi-line string.
  Each line is preserved exactly.
  Including this one.

summary: >
  This is a folded string.
  These lines get joined with spaces.
  Into a single paragraph.

JSON requires \n escape sequences for newlines in strings. YAML's block scalars are far more readable for long text.

Anchors and aliases (DRY config). YAML lets you define a value once and reference it multiple times:

defaults: &defaults
  timeout_ms: 5000
  retries: 3

production:
  <<: *defaults
  url: https://api.example.com

staging:
  <<: *defaults
  url: https://staging.example.com

What YAML is bad at

The Norway problem. YAML 1.1 (still used by many parsers) interprets unquoted NO, YES, ON, OFF, TRUE, FALSE, Y, N as booleans. This means the two-letter country code for Norway (NO) becomes false without any warning. YAML 1.2 fixes this, but parser support is inconsistent.

country: NO        # Parses as false in YAML 1.1 parsers
country: "NO"      # Correct — explicit string

enabled: yes       # true
active: on         # true
port: 8080         # integer
version: 1.0       # float (not string "1.0")

Significant whitespace. Indentation is structural in YAML. A misplaced space changes the meaning of the document. Tabs are not allowed for indentation. This makes YAML sensitive to editor settings and copy-paste errors.

Complexity. YAML's full specification is surprisingly large. Features like anchors, aliases, merge keys, custom tags, and multiple document streams make YAML parsers significantly more complex than JSON parsers. Different parsers implement different subsets, leading to portability issues.

Not suitable for data exchange. YAML is designed for human-edited files, not machine-generated data. Parsing YAML is slower than JSON, tooling is less universal, and the type coercion surprises make it unsafe for untrusted input.

YAML in practice

# docker-compose.yml
services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./static:/usr/share/nginx/html:ro
    environment:
      - NODE_ENV=production

Convert between YAML and JSON with the JSON to YAML converter or YAML to JSON converter.

Detailed comparison

| Property | CSV | JSON | YAML | |----------|-----|------|------| | Type system | None (strings only) | String, number, boolean, null, array, object | Same as JSON + date, binary, custom types | | Comments | No | No | Yes (#) | | Nesting | Flat only | Arbitrary depth | Arbitrary depth | | Human readability | Good for tables | Medium | Best | | Human writability | Easy for tables | Error-prone (trailing commas, brackets) | Easy, but whitespace-sensitive | | Machine readability | Fast | Fast | Slower | | Compactness | Best for tabular | Verbose for tabular (repeated keys) | Similar to JSON | | Spreadsheet support | Native | Requires conversion | Requires conversion | | API standard | Rare | Dominant | Rare | | Config files | Not used | Used (package.json, tsconfig) | Dominant (Docker, K8s, GitHub Actions) | | Streaming | Easy (line by line) | Possible (NDJSON) | Possible but uncommon | | Schema validation | No native standard | JSON Schema | JSON Schema (via JSON conversion) | | Error detectability | Silent (wrong column count) | Syntax errors caught immediately | Syntax errors, silent type coercion | | Spec complexity | Low | Low | High |

When the lines blur

Simple flat records: CSV or JSON?

For a list of flat records with consistent fields, both work. Prefer:

CSV if humans will open it in a spreadsheet, or it's going into a database or data science tool
JSON if it's going to an API, a JavaScript frontend, or needs type preservation (numbers, booleans, null)

Configuration files: JSON or YAML?

JSON for configs that are mostly machine-generated or rarely edited: package.json, tsconfig.json, manifest.json. JSON's strict syntax means fewer surprises.
YAML for configs developers edit frequently and that benefit from comments: docker-compose.yml, .github/workflows/*.yml, Kubernetes manifests, Ansible playbooks. The readability and comment support outweigh the quirks.

Config with comments: YAML or JSON with a comment hack?

Some tools extend JSON with comment support (JSON5, JSONC). If your toolchain supports it, JSONC can give you comments without YAML's whitespace sensitivity. Otherwise, YAML is the standard choice.

When you need both structure and spreadsheets

Sometimes you have structured JSON data that also needs to go into Excel. The JSON to Excel converter handles this directly, flattening nested objects to dot-notation columns. You don't need to manually convert to CSV first.

Conversion reference

| From → To | Tool | |-----------|------| | JSON → CSV | json-to-csv | | CSV → JSON | csv-to-json | | JSON → Excel | json-to-excel | | CSV → Excel | csv-to-excel | | JSON → YAML | json-to-yaml | | YAML → JSON | yaml-to-json |

Frequently Asked Questions

Is JSON faster to parse than CSV?

For simple flat data, CSV is typically faster to parse than JSON — the parser only reads positional values, not key-value pairs. For nested or variable-structure data, JSON parsers are highly optimized and the overhead is minimal in practice. The difference rarely matters at less than 1 million rows.

Can YAML replace JSON for APIs?

Technically yes — YAML is a superset of JSON and any JSON-capable system can be extended to accept YAML. In practice, almost no REST APIs use YAML because JSON has universal parser support at near-zero overhead, while YAML is slower to parse and has surprising type coercion behavior that's dangerous for untrusted input.

Why does my YAML parser turn `NO` into `false`?

This is YAML 1.1 behavior. The spec listed no, NO, No as boolean false values (along with yes, on, off, etc.). YAML 1.2 removed this, but many parsers haven't updated. Quote the value: country: "NO".

Which format is best for storing data long-term?

For structured records with types: JSON. It's simple, widely supported, and the spec won't surprise you. For tabular data that needs to be opened in spreadsheets now or in the future: CSV — it's readable by every tool made in the last 40 years. Avoid YAML for stored data; its implicit type coercion makes it risky for anything you'll read back programmatically.

When should I use NDJSON instead of regular JSON?

NDJSON (one JSON object per line, no outer array) is better for streaming, logging, and data pipelines where you process records one at a time. Regular JSON arrays are better for API responses, config, and files where you need the complete document before processing. See the CSV to JSON Python guide for how to write NDJSON with pandas.

Can I validate CSV the way I can validate JSON Schema?

Not natively. CSV has no built-in schema mechanism. For CSV validation, tools like csvlint and the Python csvvalidator library let you define expected columns and types. For stricter validation, convert to JSON first and then apply JSON Schema validation using the JSON Schema Validator.