DataShift

CSV Cleaner

Paste a broken CSV and get a clean one back. Fixes the specific problems that break database imports and data pipelines: BOM characters, mixed delimiters, unbalanced quotes, ragged rows, and trailing commas. Every fix is logged.

What gets fixed automatically

UTF-8 BOM (byte order mark)

Files exported from Excel often start with the invisible bytes EF BB BF. Most parsers don't handle this and you end up with a garbled first column header like name instead of name. Stripped automatically.

Mixed or wrong delimiters

European CSV exports use semicolons. Some tools output pipes. If your file uses a different delimiter than your parser expects, every row becomes one giant field. Auto-detected and normalized.

Unbalanced and unescaped quotes

A field like She said "hello" without proper escaping breaks RFC 4180 parsing. Unbalanced quotes are detected and repaired so downstream parsers don't choke.

Ragged rows (inconsistent column count)

Row 1 has 5 columns, row 14 has 4. Most database imports reject the entire file. Short rows are padded with empty values to match the header column count.

Trailing commas

Some exporters add a comma after the last value on every row, creating a ghost empty column at the end of every record. Detected and stripped.

Whitespace in cells

Leading and trailing spaces in cell values — " Alice " instead of "Alice" — cause join mismatches in SQL and pandas. Trimmed from every cell.

Why CSV files that open fine in Excel break everywhere else

Excel is forgiving. Your database is not.

Excel silently works around encoding issues, mismatched quotes, and ragged rows. PostgreSQL, MySQL, BigQuery, and pandas are strict — they reject the file or import garbage. Cleaning the file before import is the fix.

The error happens at row 47,000, not row 1.

CSV import errors in large files are painful to debug. A BOM character you can't see, a stray quote in a customer address, a missing value in one row — these show up as cryptic errors mid-import. This tool finds them in seconds.

Example

CRM export with BOM, trailing commas, and inconsistent whitespace

Input (broken)
id , name , email , status ,
1 , Alice Smith , alice@corp.com , active ,
2 , "Bob, Jr." , bob@corp.com , inactive ,
3 , Carol , carol@corp.com , active
Output (clean)
id,name,email,status
1,Alice Smith,alice@corp.com,active
2,"Bob, Jr.",bob@corp.com,inactive
3,Carol,carol@corp.com,active

UTF-8 BOM removed, trailing commas stripped, cell whitespace trimmed. The "What was fixed" panel shows each change — no surprises.

Frequently asked questions

Why does my CSV open fine in Excel but break in my database?

Excel is extremely forgiving — it handles BOM characters, unbalanced quotes, and ragged rows silently. Databases and programming libraries follow RFC 4180 strictly and reject files that don't comply. What looks fine in Excel can be completely broken for any other parser.

What exactly is a BOM character and why does it break things?

A BOM (byte order mark) is 3 invisible bytes (EF BB BF in UTF-8) that some software adds to the beginning of text files to signal the encoding. The problem: most CSV parsers read it as part of the first cell value, turning "name" into "\uFEFFname". Your header column lookup fails, every row fails, the entire import fails.

My CSV uses semicolons as the delimiter — will this work?

Yes. The cleaner auto-detects the delimiter by analyzing which character appears most consistently between quoted fields. Comma, semicolon, tab, and pipe are all detected. You can also manually override the output delimiter in the options bar.

Can this fix encoding issues like mojibake (garbled characters)?

The cleaner can detect and remove the BOM and normalize line endings, but it can't re-encode a file that was saved in the wrong encoding. If you're seeing garbled characters like é instead of é, the file was saved as Latin-1 but read as UTF-8. You need to re-export from the source with UTF-8 encoding selected.

Does cleaning change the meaning of any data?

No. The cleaner only fixes structural issues — it doesn't modify cell values, reorder rows, or filter records. It trims whitespace from cells, but doesn't change the actual content. Every fix is shown in the 'What was fixed' summary so you can verify what changed.