How to parse CSV data?

Question

Where could I find some JavaScript code to parse CSV data?

Here's a [JavaScript function that parses CSV data, accounting for commas found inside quotes](http://stackoverflow.com/questions/7431268/how-read-data-from-csv-file-using-javascript/22850815#22850815) — curran, Apr 03 '14 at 23:25
[Papa Parse](http://papaparse.com/) is another option with a lot of features (multi-threaded, header row support, auto-detect delimiter, and more) — Hinrich, Aug 24 '15 at 19:13

score 308 · Accepted Answer · edited Mar 25 '23 at 12:18

You can use the CSVToArray() function mentioned in this blog entry.

      console.log(CSVToArray(`"foo, the column",bar
2,3
"4, the value",5`));

      // ref: http://stackoverflow.com/a/1293163/2343
        // This will parse a delimited string into an array of
        // arrays. The default delimiter is the comma, but this
        // can be overriden in the second argument.
        function CSVToArray( strData, strDelimiter ){
            // Check to see if the delimiter is defined. If not,
            // then default to comma.
            strDelimiter = (strDelimiter || ",");
     
            // Create a regular expression to parse the CSV values.
            var objPattern = new RegExp(
                (
                    // Delimiters.
                    "(\\" + strDelimiter + "|\\r?\\n|\\r|^)" +
     
                    // Quoted fields.
                    "(?:\"([^\"]*(?:\"\"[^\"]*)*)\"|" +
     
                    // Standard fields.
                    "([^\"\\" + strDelimiter + "\\r\\n]*))"
                ),
                "gi"
                );
     
     
            // Create an array to hold our data. Give the array
            // a default empty first row.
            var arrData = [[]];
     
            // Create an array to hold our individual pattern
            // matching groups.
            var arrMatches = null;
     
     
            // Keep looping over the regular expression matches
            // until we can no longer find a match.
            while (arrMatches = objPattern.exec( strData )){
     
                // Get the delimiter that was found.
                var strMatchedDelimiter = arrMatches[ 1 ];
     
                // Check to see if the given delimiter has a length
                // (is not the start of string) and if it matches
                // field delimiter. If id does not, then we know
                // that this delimiter is a row delimiter.
                if (
                    strMatchedDelimiter.length &&
                    strMatchedDelimiter !== strDelimiter
                    ){
     
                    // Since we have reached a new row of data,
                    // add an empty row to our data array.
                    arrData.push( [] );
     
                }
     
                var strMatchedValue;
     
                // Now that we have our delimiter out of the way,
                // let's check to see which kind of value we
                // captured (quoted or unquoted).
                if (arrMatches[ 2 ]){
     
                    // We found a quoted value. When we capture
                    // this value, unescape any double quotes.
                    strMatchedValue = arrMatches[ 2 ].replace(
                        new RegExp( "\"\"", "g" ),
                        "\""
                        );
     
                } else {
     
                    // We found a non-quoted value.
                    strMatchedValue = arrMatches[ 3 ];
     
                }
     
     
                // Now that we have our value string, let's add
                // it to the data array.
                arrData[ arrData.length - 1 ].push( strMatchedValue );
            }
     
            // Return the parsed data.
            return( arrData );
        }

This can handle embedded commas, quotes and line breaks, eg.: var csv = 'id, value\n1, James\n02,"Jimmy Smith, Esq."\n003,"James ""Jimmy"" Smith, III"\n0004,"James\nSmith\nWuz Here"' var array = CSVToArray(csv, ","); — prototype, Jun 06 '12 at 20:27
This has worked for a wide variety of files. I find that it appends an extra blank row at the end. Not sure if this is the best way to but suggest adding a check that the new line has at least one value, even if explicitly blank "" ` if (arrMatches[2] || arrMatches[3]) { arrData.push([]); rowCount++; } else { break; } }` — prototype, Oct 30 '12 at 03:00
It gives `undefined` for __empty fields__ that is __quoted__. Example: `CSVToArray("4,,6")` gives me `[["4","","6"]]`, but `CSVToArray("4,\"\",6")` gives me `[["4",undefined,"6"]]`. — Pang, Nov 14 '12 at 04:36
I've had issues with this in firefox, and the script has become unresponsive. It seemed to only affect a few users though, so couldn't find the cause — JDandChips, Mar 18 '13 at 11:51
Thanks, beautiful small parser using regular expressions! CSVToArray('user;host;password\npete;192.168.1.1;"semi;colon"\nfrank;192.168.1.2;"quote\"missing"') unfortunately doesn't parse the last data row correct. Any idea? — John Rumpel, Aug 19 '15 at 20:30
There is a bug in the regex: `"([^\"\\"` should be `"([^\\"`. Otherwise a double quote anywhere in an unquoted value will prematurely end it. Found this the hard way... — Walter Tross, Nov 30 '15 at 21:52
This solution poses a problem when the first value of the string is empty like `',b,,d,,e,'`. Adding `if (!arrMatches.index && arrMatches[0].charAt(0)=== strDelimiter) {arrData.push('');}` before the `if (arrMatches[ 2 ]){` fixes the problem — Guilherme Lopes, Dec 12 '16 at 20:25
For anyone looking for a reduced version of the above method, with the regex fix described above applied: https://gist.github.com/Jezternz/c8e9fafc2c114e079829974e3764db75 — Josh Mc, Sep 23 '18 at 01:39
Borrowed from @JoshMc (thanks!) and added header capability and more robust character escaping. See https://gist.github.com/plbowers/7560ae793613ee839151624182133159 — Peter Bowers, Dec 29 '18 at 12:18
This solution will get stuck in an infinite loop when the input is an empty string. — Tim Iles, Mar 15 '21 at 18:46

score 156 · Answer 2 · edited Oct 07 '21 at 05:46

156

jQuery-CSV

It's a jQuery plugin designed to work as an end-to-end solution for parsing CSV into JavaScript data. It handles every single edge case presented in RFC 4180, as well as some that pop up for Excel/Google spreadsheet exports (i.e., mostly involving null values) that the specification is missing.

Example:

track,artist,album,year

Dangerous,'Busta Rhymes','When Disaster Strikes',1997

// Calling this
music = $.csv.toArrays(csv)

// Outputs...
[
  ["track", "artist", "album", "year"],
  ["Dangerous", "Busta Rhymes", "When Disaster Strikes", "1997"]
]

console.log(music[1][2]) // Outputs: 'When Disaster Strikes'

Update:

Oh yeah, I should also probably mention that it's completely configurable.

music = $.csv.toArrays(csv, {
  delimiter: "'", // Sets a custom value delimiter character
  separator: ';', // Sets a custom field separator character
});

Update 2:

It now works with jQuery on Node.js too. So you have the option of doing either client-side or server-side parsing with the same library.

Update 3:

Since the Google Code shutdown, jquery-csv has been migrated to GitHub.

Disclaimer: I am also the author of jQuery-CSV.

edited Oct 07 '21 at 05:46

Community

1
1

answered Apr 24 '12 at 01:24

Evan Plaice

13,944
6
76
94

38

Why is it jQuery csv? Why does it depend on jQuery? I've had a quick scan through the source... it doesn't look like you're using jQuery – paulslater19 May 10 '12 at 08:20
This is great. Would be useful to extend this to handle embedded line breaks or escaped double quotes, e.g. "James ""Jimmy"" Smith" or "Embedded\nLine\nBreaks" – prototype Jun 06 '12 at 20:18
@user645715 It handles both. If you take a look at the test runner (http://jquery-csv.googlecode.com/git/test/test.html) it outlines all of the edge cases that the plugin covers. – Evan Plaice Nov 19 '12 at 20:28
17

@paulslater19 The plugin doesn't depend on jquery. Rather, it follows the common jQuery development guidelines. All of the methods included are static and reside under their own namespace (ie $.csv). To use them without jQuery simply create a global $ object that the plugin will bind to during initialization. – Evan Plaice Nov 19 '12 at 20:48
2

is `csv` in the solution code refer to the `.csv filename`? i'm interested in a good JS/JQuery tool to parse a csv file – bouncingHippo Nov 22 '12 at 19:59
1

@bouncingHippo In the example it's just referring to a string of csv data but the lib can be used to open csv files locally in the browser using the HTML5 File API. Here's an example of it in action http://jquery-csv.googlecode.com/git/examples/file-handling.html. – Evan Plaice Nov 23 '12 at 22:09
@bouncingHippo lol, thanks. Have fun with HTML5 File API. just keep in mind that not all browsers (looking @ you IE) support it yet. – Evan Plaice Jan 31 '13 at 01:22
heh heh reticulating splines – MikeMurko Feb 06 '13 at 17:22
Is this (very nice) library still maintained (as it is now archived at google code)? – WoJ Jan 13 '16 at 09:31
Yes, it has moved to Github. I haven't had the time to migrate all of the documentation yet, which is why the link still points to Google Code. Here's the GH link https://github.com/evanplaice/jquery-csv. – Evan Plaice Jan 13 '16 at 16:10
@EvanPlaice what's about it's support for reactJs? – Adil Mar 01 '18 at 08:10
3

Given that it's not dependent on jQuery, it would be better to remove the global "$" dependency and let users pass any object reference they want. Perhaps default to jQuery if it's available. There are other libraries that use "$" and it might be used by development teams with minimal proxies of those libraries. – RobG Jul 08 '18 at 06:47
1

@RobG Yes. Ideally it would be best to rebrand the library altogether, remove any reference to jQuery, and rewrite the codebase in Typescript so it can be transpiled down to support both ESM and Typescript with type definitions. Time to develop and lock down a new name are the limiting factors here. – Evan Plaice Jul 11 '18 at 15:55

Trevor Dixon · Answer 3 · 2023-03-03T13:39:27.000

63

Here's an extremely simple CSV parser that handles quoted fields with commas, new lines, and escaped double quotation marks. There's no splitting or regular expression. It scans the input string 1-2 characters at a time and builds an array.

Test it at http://jsfiddle.net/vHKYH/.

function parseCSV(str) {
    const arr = [];
    let quote = false;  // 'true' means we're inside a quoted field

    // Iterate over each character, keep track of current row and column (of the returned array)
    for (let row = 0, col = 0, c = 0; c < str.length; c++) {
        let cc = str[c], nc = str[c+1];        // Current character, next character
        arr[row] = arr[row] || [];             // Create a new row if necessary
        arr[row][col] = arr[row][col] || '';   // Create a new column (start with empty string) if necessary

        // If the current character is a quotation mark, and we're inside a
        // quoted field, and the next character is also a quotation mark,
        // add a quotation mark to the current column and skip the next character
        if (cc == '"' && quote && nc == '"') { arr[row][col] += cc; ++c; continue; }

        // If it's just one quotation mark, begin/end quoted field
        if (cc == '"') { quote = !quote; continue; }

        // If it's a comma and we're not in a quoted field, move on to the next column
        if (cc == ',' && !quote) { ++col; continue; }

        // If it's a newline (CRLF) and we're not in a quoted field, skip the next character
        // and move on to the next row and move to column 0 of that new row
        if (cc == '\r' && nc == '\n' && !quote) { ++row; col = 0; ++c; continue; }

        // If it's a newline (LF or CR) and we're not in a quoted field,
        // move on to the next row and move to column 0 of that new row
        if (cc == '\n' && !quote) { ++row; col = 0; continue; }
        if (cc == '\r' && !quote) { ++row; col = 0; continue; }

        // Otherwise, append the current character to the current column
        arr[row][col] += cc;
    }
    return arr;
}

edited Mar 03 '23 at 13:39

answered Feb 20 '13 at 23:22

Trevor Dixon

23,216
12
72
109

Its simple and it works for me, the only thing I changed was adding a trim() to the value :) – JustEngland Jan 21 '14 at 22:01
4

This seems cleaner and more straight forward. I had to parse a 4mb file and the other answers crashed on me in ie8, but this managed it. – Charles Clayton Jun 22 '14 at 15:26
3

This also worked for me. I had to do one modification though to allow proper handling of line feeds: `if (cc == '\r' && nc == '\n' && !quote) { ++row; col = 0; ++c; continue; } if (cc == '\n' && !quote) { ++row; col = 0; continue; }` – user655063 Apr 08 '15 at 13:00
Concerning my previous comment: see also e.g. https://theonemanitdepartment.wordpress.com/2014/12/15/the-absolute-minimum-everyone-working-with-data-absolutely-positively-must-know-about-file-types-encoding-delimiters-and-data-types-no-excuses/ and https://en.wikipedia.org/wiki/Newline . This might also explain why @JustEngland had to add a trim() to the value. – user655063 Apr 08 '15 at 13:07
1

Another user (@sorin-postelnicu) helpfully published a companion function to turn the result into a dictionary object: http://jsfiddle.net/8t2po6wh/. – Trevor Dixon Oct 20 '17 at 12:37
1

Yeah, anytime speed is needed or memory footprints matter, a clean solution like this is far superior. State machine-esque parsing is so much smoother. – Tatarize Aug 09 '18 at 05:22
here's my version of this with a headers option you can use to parse rows into objects keyed by the header https://gist.github.com/atomkirk/eccb66f77b306d0d1fcecb2c605bd22e – atomkirk Aug 17 '18 at 21:09
Beautiful code, and this is perfect for when you just want to deal with a fixed format without importing a library. – Toper Aug 21 '18 at 16:43
1

Turn it into a generator that yields and you got yourself a way to handle bigger data: [example](https://jsfiddle.net/bn0rwt4h/1/) – Endless Nov 16 '20 at 20:29
1

Your answer is used by chatgpt :) https://chat.openai.com/share/db0357db-4988-46de-81f4-b0be5b61a30e – Sharikov Vladislav May 31 '23 at 10:02

score 43 · Answer 4 · edited Sep 01 '20 at 13:32

I have an implementation as part of a spreadsheet project.

This code is not yet tested thoroughly, but anyone is welcome to use it.

As some of the answers noted though, your implementation can be much simpler if you actually have DSV or TSV file, as they disallow the use of the record and field separators in the values. CSV, on the other hand, can actually have commas and newlines inside a field, which breaks most regular expression and split-based approaches.

var CSV = {
    parse: function(csv, reviver) {
        reviver = reviver || function(r, c, v) { return v; };
        var chars = csv.split(''), c = 0, cc = chars.length, start, end, table = [], row;
        while (c < cc) {
            table.push(row = []);
            while (c < cc && '\r' !== chars[c] && '\n' !== chars[c]) {
                start = end = c;
                if ('"' === chars[c]){
                    start = end = ++c;
                    while (c < cc) {
                        if ('"' === chars[c]) {
                            if ('"' !== chars[c+1]) {
                                break;
                            }
                            else {
                                chars[++c] = ''; // unescape ""
                            }
                        }
                        end = ++c;
                    }
                    if ('"' === chars[c]) {
                        ++c;
                    }
                    while (c < cc && '\r' !== chars[c] && '\n' !== chars[c] && ',' !== chars[c]) {
                        ++c;
                    }
                } else {
                    while (c < cc && '\r' !== chars[c] && '\n' !== chars[c] && ',' !== chars[c]) {
                        end = ++c;
                    }
                }
                row.push(reviver(table.length-1, row.length, chars.slice(start, end).join('')));
                if (',' === chars[c]) {
                    ++c;
                }
            }
            if ('\r' === chars[c]) {
                ++c;
            }
            if ('\n' === chars[c]) {
                ++c;
            }
        }
        return table;
    },

    stringify: function(table, replacer) {
        replacer = replacer || function(r, c, v) { return v; };
        var csv = '', c, cc, r, rr = table.length, cell;
        for (r = 0; r < rr; ++r) {
            if (r) {
                csv += '\r\n';
            }
            for (c = 0, cc = table[r].length; c < cc; ++c) {
                if (c) {
                    csv += ',';
                }
                cell = replacer(r, c, table[r][c]);
                if (/[,\r\n"]/.test(cell)) {
                    cell = '"' + cell.replace(/"/g, '""') + '"';
                }
                csv += (cell || 0 === cell) ? cell : '';
            }
        }
        return csv;
    }
};

This is one of my favorite answers. It's a real parser implemented in not a lot of code. — Trevor Dixon, Dec 20 '12 at 07:15
If a comma is placed at the end of a line, an empty cell should follow it. This code just skips to the next line, resulting in an `undefined` cell. For example, `console.log(CSV.parse("first,last,age\r\njohn,doe,"));` — skibulk, Aug 21 '16 at 12:39
Also, empty cells should parse to empty strings. This code parses them into zeros, which is confusing since cells can actually contain zeros: `console.log(CSV.parse("0,,2,3"));` — skibulk, Aug 21 '16 at 12:58
@skibulk Your second comment is incorrect (at least in Chrome is works fine with your example). Your first comment is valid though, although it is easily fixed - add the following right before `if ('\r' === chars[c]) { ... }`: `if (end === c-1) { row.push(reviver(table.length-1, row.length, '')); }` — coderforlife, Nov 07 '16 at 21:01

score 16 · Answer 5 · edited Sep 01 '20 at 13:36

16

csvToArray v1.3

A compact (645 bytes), but compliant function to convert a CSV string into a 2D array, conforming to the RFC4180 standard.

https://code.google.com/archive/p/csv-to-array/downloads

Common Usage: jQuery

 $.ajax({
        url: "test.csv",
        dataType: 'text',
        cache: false
 }).done(function(csvAsString){
        csvAsArray=csvAsString.csvToArray();
 });

Common usage: JavaScript

csvAsArray = csvAsString.csvToArray();

Override field separator

csvAsArray = csvAsString.csvToArray("|");

Override record separator

csvAsArray = csvAsString.csvToArray("", "#");

Override Skip Header

csvAsArray = csvAsString.csvToArray("", "", 1);

Override all

csvAsArray = csvAsString.csvToArray("|", "#", 1);

edited Sep 01 '20 at 13:36

Peter Mortensen

30,738
21
105
131

answered Apr 17 '13 at 10:53

dt192

1,003
7
12

This sounds interesting but I can't find the code now. Can you post it again? – Sam Watkins Apr 03 '18 at 03:37
1

I've updated the main post with a current link. Many thanks. – dt192 Apr 04 '18 at 09:25
It is in the Google Code archive, but perhaps update to a new location? – Peter Mortensen Sep 01 '20 at 13:36
The examples in this answer may not work as I have seen the source code has been changed. The modified version of the above examples for csvToArray v2.1 should be like this: Override field separator `csvAsArray = csvAsString.csvToArray({fSep: "|"});` Override record separator `csvAsArray = csvAsString.csvToArray({rSep: "#"});` Override Skip Header `csvAsArray = csvAsString.csvToArray({head: true});` Override all `csvAsArray = csvAsString.csvToArray({fSep: "|", rSep: "#", head: true});` – Faisal Khan Aug 01 '22 at 14:11

score 14 · Answer 6 · answered Aug 15 '12 at 19:36

14

Here's my PEG(.js) grammar that seems to do ok at RFC 4180 (i.e. it handles the examples at http://en.wikipedia.org/wiki/Comma-separated_values):

start
  = [\n\r]* first:line rest:([\n\r]+ data:line { return data; })* [\n\r]* { rest.unshift(first); return rest; }

line
  = first:field rest:("," text:field { return text; })*
    & { return !!first || rest.length; } // ignore blank lines
    { rest.unshift(first); return rest; }

field
  = '"' text:char* '"' { return text.join(''); }
  / text:[^\n\r,]* { return text.join(''); }

char
  = '"' '"' { return '"'; }
  / [^"]

Try it out at http://jsfiddle.net/knvzk/10 or http://pegjs.majda.cz/online. Download the generated parser at https://gist.github.com/3362830.

answered Aug 15 '12 at 19:36

Trevor Dixon

23,216
12
72
109

2

PEG? Isn't building an AST a little memory heavy for a Type III grammar. Can it handle fields that contain newline chars because that's the most difficult case to cover in a 'regular grammar' parser. Either way, +1 for a novel approach. – Evan Plaice Jan 31 '13 at 01:37
1

Yes, it handles newline inside a field. – Trevor Dixon Jan 31 '13 at 03:52
2

Nice... With that alone, it's better than 95% of all the implementations I have ever seen. If you want to check for full RFC compliance, take a look at the tests here (http://jquery-csv.googlecode.com/git/test/test.html). – Evan Plaice Jan 31 '13 at 18:24
7

Well played. +1 for turning me on to PEG. I do love parser-generators. "Why program by hand in five days what you can spend five years of your life automating?" -- Terence Parr, ANTLR – Subfuzion Mar 28 '13 at 22:51

Stephen Quan · Answer 7 · 2022-09-13T02:11:13.167

7

Here's another solution. This uses:

a coarse global regular expression for splitting the CSV string (which includes surrounding quotes and trailing commas)
fine-grained regular expression for cleaning up the surrounding quotes and trailing commas
also, has type correction differentiating strings, numbers, boolean values and null values

For the following input string:

"This is\, a value",Hello,4,-123,3.1415,'This is also\, possible',true,

The code outputs:

[
  "This is, a value",
  "Hello",
  4,
  -123,
  3.1415,
  "This is also, possible",
  true,
  null
]

Here's my implementation of parseCSVLine() in a runnable code snippet:

function parseCSVLine(text) {
  return text.match( /\s*(\"[^"]*\"|'[^']*'|[^,]*)\s*(,|$)/g ).map( function (text) {
    let m;
    if (m = text.match(/^\s*,?$/)) return null; // null value
    if (m = text.match(/^\s*\"([^"]*)\"\s*,?$/)) return m[1]; // Double Quoted Text
    if (m = text.match(/^\s*'([^']*)'\s*,?$/)) return m[1]; // Single Quoted Text
    if (m = text.match(/^\s*(true|false)\s*,?$/)) return m[1] === "true"; // Boolean
    if (m = text.match(/^\s*((?:\+|\-)?\d+)\s*,?$/)) return parseInt(m[1]); // Integer Number
    if (m = text.match(/^\s*((?:\+|\-)?\d*\.\d*)\s*,?$/)) return parseFloat(m[1]); // Floating Number
    if (m = text.match(/^\s*(.*?)\s*,?$/)) return m[1]; // Unquoted Text
    return text;
  } );
}

let data = `"This is\, a value",Hello,4,-123,3.1415,'This is also\, possible',true,`;
let obj = parseCSVLine(data);
console.log( JSON.stringify( obj, undefined, 2 ) );

edited Sep 13 '22 at 02:11

answered Aug 11 '20 at 22:55

Stephen Quan

21,481
4
88
75

2

This is very neat! Is this part of a npm package somewhere? – cstrat Jun 16 '21 at 06:36
I made one change to the first regex: `text.match( /\s*(\".*?\"|'.*?'|[^,]+|)\s*(,|$)/g )` I had to add the last `|` to the first capture group to allow for an empty cell in a CSV. – cstrat Jun 18 '21 at 09:47
Now I quickly realised this created another edge case for me where it matches an empty string at the end superfluously. Tried adding a negative lookahead to not count an empty at the end: `text.match(/\s*(".*?"|'.*?'|[^,]+|(?!$))\s*(,|$)/g)` This created another issue where I can't have an empty last cell. I might go back to the original fix and just filter out extra empty cells in the last column. – cstrat Jun 18 '21 at 09:59
how to do u multiple entries with each entry having multi lined columns? – B''H Bi'ezras -- Boruch Hashem Jun 03 '22 at 08:11
@cstrat I made a change last match from `[^,]+` to `[^,]*` so that it now matches an empty cell and returns it as null. I have updated the example to reflect it. @BoruchHashem I have replaced the `(\".*?\")` with `(\"[^"]*\")` so that it can now match multiline double-quoted strings. I did a similar change for single quoted strings. – Stephen Quan Sep 13 '22 at 02:18

score 5 · Answer 8 · edited Sep 01 '20 at 13:42

Here's my simple vanilla JavaScript code:

let a = 'one,two,"three, but with a comma",four,"five, with ""quotes"" in it.."'
console.log(splitQuotes(a))

function splitQuotes(line) {
  if(line.indexOf('"') < 0) 
    return line.split(',')

  let result = [], cell = '', quote = false;
  for(let i = 0; i < line.length; i++) {
    char = line[i]
    if(char == '"' && line[i+1] == '"') {
      cell += char
      i++
    } else if(char == '"') {
      quote = !quote;
    } else if(!quote && char == ',') {
      result.push(cell)
      cell = ''
    } else {
      cell += char
    }
    if ( i == line.length-1 && cell) {
      result.push(cell)
    }
  }
  return result
}

score 3 · Answer 9 · edited Sep 01 '20 at 13:17

I'm not sure why I couldn't get Kirtan's example to work for me. It seemed to be failing on empty fields or maybe fields with trailing commas...

This one seems to handle both.

I did not write the parser code, just a wrapper around the parser function to make this work for a file. See attribution.

    var Strings = {
        /**
         * Wrapped CSV line parser
         * @param s      String delimited CSV string
         * @param sep    Separator override
         * @attribution: http://www.greywyvern.com/?post=258 (comments closed on blog :( )
         */
        parseCSV : function(s,sep) {
            // http://stackoverflow.com/questions/1155678/javascript-string-newline-character
            var universalNewline = /\r\n|\r|\n/g;
            var a = s.split(universalNewline);
            for(var i in a){
                for (var f = a[i].split(sep = sep || ","), x = f.length - 1, tl; x >= 0; x--) {
                    if (f[x].replace(/"\s+$/, '"').charAt(f[x].length - 1) == '"') {
                        if ((tl = f[x].replace(/^\s+"/, '"')).length > 1 && tl.charAt(0) == '"') {
                            f[x] = f[x].replace(/^\s*"|"\s*$/g, '').replace(/""/g, '"');
                          } else if (x) {
                        f.splice(x - 1, 2, [f[x - 1], f[x]].join(sep));
                      } else f = f.shift().split(sep).concat(f);
                    } else f[x].replace(/""/g, '"');
                  } a[i] = f;
        }
        return a;
        }
    }

Peter Thoeny · Answer 10 · 2020-05-24T22:29:34.663

Regular expressions to the rescue! These few lines of code handle properly quoted fields with embedded commas, quotes, and newlines based on the RFC 4180 standard.

function parseCsv(data, fieldSep, newLine) {
    fieldSep = fieldSep || ',';
    newLine = newLine || '\n';
    var nSep = '\x1D';
    var qSep = '\x1E';
    var cSep = '\x1F';
    var nSepRe = new RegExp(nSep, 'g');
    var qSepRe = new RegExp(qSep, 'g');
    var cSepRe = new RegExp(cSep, 'g');
    var fieldRe = new RegExp('(?<=(^|[' + fieldSep + '\\n]))"(|[\\s\\S]+?(?<![^"]"))"(?=($|[' + fieldSep + '\\n]))', 'g');
    var grid = [];
    data.replace(/\r/g, '').replace(/\n+$/, '').replace(fieldRe, function(match, p1, p2) {
        return p2.replace(/\n/g, nSep).replace(/""/g, qSep).replace(/,/g, cSep);
    }).split(/\n/).forEach(function(line) {
        var row = line.split(fieldSep).map(function(cell) {
            return cell.replace(nSepRe, newLine).replace(qSepRe, '"').replace(cSepRe, ',');
        });
        grid.push(row);
    });
    return grid;
}

const csv = 'A1,B1,C1\n"A ""2""","B, 2","C\n2"';
const separator = ',';      // field separator, default: ','
const newline = ' <br /> '; // newline representation in case a field contains newlines, default: '\n' 
var grid = parseCsv(csv, separator, newline);
// expected: [ [ 'A1', 'B1', 'C1' ], [ 'A "2"', 'B, 2', 'C <br /> 2' ] ]

You don't need a parser-generator such as lex/yacc. The regular expression handles RFC 4180 properly thanks to positive lookbehind, negative lookbehind, and positive lookahead.

Clone/download code at https://github.com/peterthoeny/parse-csv-js

Regexps are implemented using finite state machines so you do, in fact, need FSM. — Henry Henrinson, Apr 30 '20 at 18:26
@HenryHenrinson: Not necessarily. I challenge you to find an issue with above code. I use it in production. It's also possible to do more complex parsing with regular expressions. You don't need an LL parser to create a syntax tree. Here is a blog: How to Use Regular Expressions to Parse Nested Structures, https://twiki.org/cgi-bin/view/Blog/BlogEntry201109x3 — Peter Thoeny, May 02 '20 at 04:36
@HenryHenrinson: Oh, yes, dummy me, we are in violent agreement :-) — Peter Thoeny, May 24 '20 at 22:16

bosscube · Answer 11 · 2021-10-31T15:32:30.950

Just throwing this out there.. I recently ran into the need to parse CSV columns with Javascript, and I opted for my own simple solution. It works for my needs, and may help someone else.

const csvString = '"Some text, some text",,"",true,false,"more text","more,text, more, text ",true';

const parseCSV = text => {
  const lines = text.split('\n');
  const output = [];

  lines.forEach(line => {
      line = line.trim();

      if (line.length === 0) return;

      const skipIndexes = {};
      const columns = line.split(',');

      output.push(columns.reduce((result, item, index) => {
          if (skipIndexes[index]) return result;

          if (item.startsWith('"') && !item.endsWith('"')) {
              while (!columns[index + 1].endsWith('"')) {
                  index++;
                  item += `,${columns[index]}`;
                  skipIndexes[index] = true;
              }

              index++;
              skipIndexes[index] = true;
              item += `,${columns[index]}`;
          }

          result.push(item);
          return result;
      }, []));
  });

  return output;
};

console.log(parseCSV(csvString));

Zero14 · Answer 12 · 2023-05-25T17:57:49.983

Personally I like to use deno std library since most modules are officially compatible with the browser

The problem is that the std is in typescript but official solution might happen in the future https://github.com/denoland/deno_std/issues/641 https://github.com/denoland/dotland/issues/1728

For now there is an actively maintained on the fly transpiler https://bundle.deno.dev/

so you can use it simply like this

<script type="module">
import { parse } from "https://bundle.deno.dev/https://deno.land/std@0.126.0/encoding/csv.ts"
console.log(await parse("a,b,c\n1,2,3"))
</script>

and you can also just vendor it

curl https://bundle.deno.dev/https://deno.land/std@0.126.0/encoding/csv.ts --output src/csv.js

score -1 · Answer 13 · edited Sep 01 '20 at 13:41

I have constructed this JavaScript script to parse a CSV in string to array object. I find it better to break down the whole CSV into lines, fields and process them accordingly. I think that it will make it easy for you to change the code to suit your need.

    //
    //
    // CSV to object
    //
    //

    const new_line_char = '\n';
    const field_separator_char = ',';

    function parse_csv(csv_str) {

        var result = [];

        let line_end_index_moved = false;
        let line_start_index = 0;
        let line_end_index = 0;
        let csr_index = 0;
        let cursor_val = csv_str[csr_index];
        let found_new_line_char = get_new_line_char(csv_str);
        let in_quote = false;

        // Handle \r\n
        if (found_new_line_char == '\r\n') {
            csv_str = csv_str.split(found_new_line_char).join(new_line_char);
        }
        // Handle the last character is not \n
        if (csv_str[csv_str.length - 1] !== new_line_char) {
            csv_str += new_line_char;
        }

        while (csr_index < csv_str.length) {
            if (cursor_val === '"') {
                in_quote = !in_quote;
            } else if (cursor_val === new_line_char) {
                if (in_quote === false) {
                    if (line_end_index_moved && (line_start_index <= line_end_index)) {
                        result.push(parse_csv_line(csv_str.substring(line_start_index, line_end_index)));
                        line_start_index = csr_index + 1;
                    } // Else: just ignore line_end_index has not moved or line has not been sliced for parsing the line
                } // Else: just ignore because we are in a quote
            }
            csr_index++;
            cursor_val = csv_str[csr_index];
            line_end_index = csr_index;
            line_end_index_moved = true;
        }

        // Handle \r\n
        if (found_new_line_char == '\r\n') {
            let new_result = [];
            let curr_row;
            for (var i = 0; i < result.length; i++) {
                curr_row = [];
                for (var j = 0; j < result[i].length; j++) {
                    curr_row.push(result[i][j].split(new_line_char).join('\r\n'));
                }
                new_result.push(curr_row);
            }
            result = new_result;
        }
        return result;
    }

    function parse_csv_line(csv_line_str) {

        var result = [];

        //let field_end_index_moved = false;
        let field_start_index = 0;
        let field_end_index = 0;
        let csr_index = 0;
        let cursor_val = csv_line_str[csr_index];
        let in_quote = false;

        // Pretend that the last char is the separator_char to complete the loop
        csv_line_str += field_separator_char;

        while (csr_index < csv_line_str.length) {
            if (cursor_val === '"') {
                in_quote = !in_quote;
            } else if (cursor_val === field_separator_char) {
                if (in_quote === false) {
                    if (field_start_index <= field_end_index) {
                        result.push(parse_csv_field(csv_line_str.substring(field_start_index, field_end_index)));
                        field_start_index = csr_index + 1;
                    } // Else: just ignore field_end_index has not moved or field has not been sliced for parsing the field
                } // Else: just ignore because we are in quote
            }
            csr_index++;
            cursor_val = csv_line_str[csr_index];
            field_end_index = csr_index;
            field_end_index_moved = true;
        }
        return result;
    }

    function parse_csv_field(csv_field_str) {
        with_quote = (csv_field_str[0] === '"');

        if (with_quote) {
            csv_field_str = csv_field_str.substring(1, csv_field_str.length - 1); // remove the start and end quotes
            csv_field_str = csv_field_str.split('""').join('"'); // handle double quotes
        }
        return csv_field_str;
    }

    // Initial method: check the first newline character only
    function get_new_line_char(csv_str) {
        if (csv_str.indexOf('\r\n') > -1) {
            return '\r\n';
        } else {
            return '\n'
        }
    }

score -9 · Answer 14 · edited Sep 01 '20 at 13:28

-9

Just use .split(','):

var str = "How are you doing today?";
var n = str.split(" ");

edited Sep 01 '20 at 13:28

Peter Mortensen

30,738
21
105
131

answered Sep 17 '12 at 21:17

Micah

313
1
4

2

Why is this a bad answer? It is native, places string content into workable array... – Micah Sep 26 '12 at 15:41
21

*Lots* of reasons. First, it doesn't remove the double quotes on delimited values. Doesn't handle line splitting. Doesn't escape double-double quotes used to escape double quotes used in delimited values. Doesn't allow empty values. etc, etc... The flexibility of the CSV format makes it very easy to use but difficult to parse. I won't downvote this but only because I don't downvote competing answers. – Evan Plaice Oct 06 '12 at 00:51
Using split can be used to break out line by line, and to break out the line values. Seems like a simple solution, but then again, it is a small code and if used wisely, very powerful. – Micah Oct 16 '12 at 20:06
1

What about when you encounter a value that contains a newline char? A simple split function will incorrectly interpret it as the end of an entry instead of skipping over it like it should. Parsing CSV is a lot more complicated than just providing 2 split routines (one for newlines, one for delimiters). – Evan Plaice Oct 16 '12 at 21:40
2

(cont) Also split on null values (a,null,,value) returns nothing whereas it should return an empty string. Don't get me wrong, split is a good start if you are 100% positive that the incoming data won't break the parser but creating a robust parser that can handle any data that is RFC 4801 compliant is significantly more complicated. – Evan Plaice Oct 16 '12 at 21:45
8

Evan, I think your javascript library is awesome. But here's another perspective - I appreciated this answer, as I am simply storing a series of numbers in a very predictable fashion. It is much more important to me to get guaranteed cross-browser Javascript compatibility and maintainability as far into the future as possible, than include a large (albeit well-written and well-tested) library. Different needs require different approaches. If I ever need real CSV power I will DEFINITELY commit to using your library! :-) – moodboom Feb 20 '13 at 01:01
.split() isn't a good solution for lots of cases, but the OP didn't say they were importing a spreadsheet or any sort of unknown data. If you assemble some integers into a CSV string and then need a way to split them into a JS array, why use anything more complicated than this? – DaveD May 16 '14 at 19:43

How to parse CSV data?

14 Answers14

Linked

Related