6

I have a long string containing CSV data from a file. I want to store it in a JavaScript Array of Arrays. But one column has arbitrary text in it. That text could contain double-quotes and commas.

Splitting the CSV string into separate row strings is no problem:

var theRows = theCsv.split(/\r?\n/);

But then how would I best split each row?

Since it's CSV data I need to split on commas. But

var theArray = new Array();
for (var i=0, i<theRows.length; i++) {
    theArray[i] = theRows[i].split(',');    
}

won't work for elements containing quotes and commas, like this example:

512,"""Fake News"" and the ""Best Way"" to deal with A, B, and C", 1/18/2019,media

How can I make sure that 2nd element gets properly stored in a single array element as

 "Fake News" and the "Best Way" to deal with A, B, and C

Thanks.

The suggested solution which looked similar unfortunately did not work when I tried the CSVtoArray function there. Instead of returning array elements, a null value was returned, as described in my comment below.

Doug Lerner
  • 1,383
  • 2
  • 17
  • 36
  • I guess you have to find a way to split on commas that are *outside* (somehow) of quote marks. My first thought is a regex but I think it could quickly turn ugly. So, a string parsing approach might be better - something like going through the characters and noting quote marks - if there are any left open, then disregard the comma as a field separator and keep it as part of the field content. With all that said, there is bound to be a library that already does that for you.\ – VLAZ Feb 25 '19 at 06:39
  • Possible duplicate of [How can I parse a CSV string with Javascript, which contains comma in data?](https://stackoverflow.com/questions/8493195/how-can-i-parse-a-csv-string-with-javascript-which-contains-comma-in-data) – VLAZ Feb 25 '19 at 06:49
  • Checking the possible duplicate mentioned above to see if it fits. Will report back. Thanks. – Doug Lerner Feb 25 '19 at 07:01
  • Yeah, I'm not totally sure, either. But I decided to report it just in case it helps. Maybe it doesn't help you but somebody else who finds your question and needs that answer. – VLAZ Feb 25 '19 at 07:04
  • Unfortunately, this string returns null from CSVtoArray(text) in the other solution: `var text = '512,"""Fake News"" and the ""Best Way"" to deal with A, B, and C", 1/18/2019,media';`. I would expect 4 elements: `512` and `"Fake News" and the "Best Way" to deal with A, B, and C` and `1/18/2019` and `media`. – Doug Lerner Feb 25 '19 at 08:18
  • See also https://stackoverflow.com/questions/11456850/split-a-string-by-commas-but-ignore-commas-within-double-quotes-using-javascript – sideshowbarker Feb 25 '19 at 08:26
  • Parse it from left to right. – Salman A Feb 25 '19 at 08:33
  • Can't you use semicolons ";" instead commas to split ech column? – Marco Dal Zovo Feb 25 '19 at 08:58
  • I was thinking the same thing. But which commas should I first replace with semi-colons? :) – Doug Lerner Feb 25 '19 at 09:51

1 Answers1

3

This should do it:

let parseRow = function(row) {
  let isInQuotes = false;
  let values = [];
  let val = '';

  for (let i = 0; i < row.length; i++) {
    switch (row[i]) {
      case ',':
        if (isInQuotes) {
          val += row[i];
        } else {
          values.push(val);
          val = '';
        }
        break;

      case '"':
        if (isInQuotes && i + 1 < row.length && row[i+1] === '"') {
          val += '"'; 
          i++;
        } else {
          isInQuotes = !isInQuotes
        }
        break;

      default:
        val += row[i];
        break;
    }
  }

  values.push(val);

  return values;
}

It will return the values in an array:

parseRow('512,"""Fake News"" and the ""Best Way"" to deal with A, B, and C", 1/18/2019,media');
// => ['512', '"Fake News" and the "Best Way" to deal with A, B, and C', ' 1/18/2019', 'media']

To get the requested array of arrays you can do:

let parsedCsv = theCsv.split(/\r?\n/).map(parseRow);

Explanation

The code might look a little obscure. But the principal idea is as follows: We parse the string character by character. When we encounter a " we set isInQuotes = true. This will change the behavior for parsing ,and "". When we encounter a single " we set isInQuotes = false again.

MaximeW
  • 430
  • 3
  • 13