Glossary

What Is CSV? (Comma-Separated Values) Explained

CSV (Comma-Separated Values) is a plain text format for tabular data. Each line represents a row; field values within a row are separated by a delimiter (usually a comma). CSV is the most widely supported format for spreadsheet data exchange — readable by Excel, Google Sheets, pandas, and virtually every database system.

CSV Syntax Rules (RFC 4180)

Each record is on a separate line. Fields are separated by commas. Fields containing commas, double quotes, or line breaks must be enclosed in double quotes. A double quote inside a quoted field is represented as two consecutive double quotes (""). The first record may be a header row with column names.

Delimiter Variations

Despite the name, CSV files sometimes use semicolons (;) as delimiters — common in European locales where commas are decimal separators. Tab-separated values (TSV, .tsv) avoid delimiter conflicts in data that naturally contains commas. Always verify the delimiter when parsing unfamiliar CSV files, especially those generated by Excel.

CSV vs JSON for APIs

CSV is excellent for large tabular datasets: efficient to generate and stream, low overhead, directly importable into spreadsheets and databases. JSON handles nested and hierarchical data that CSV cannot represent, and is self-describing (field names are in the data). For tabular data in bulk exports and analytics pipelines, CSV usually produces smaller files.

CSV Encoding Pitfalls

CSV files lack a built-in encoding declaration. Excel defaults to system locale encoding (often Windows-1252) while most developers expect UTF-8. To force Excel to open a UTF-8 CSV correctly, add a UTF-8 BOM character (0xEF 0xBB 0xBF) at the start of the file. This is a common source of garbled characters in international data.