-1

I need to parse csv file using JavaScript. Parsing comma separated string could be easy in case if the values do not contain commas themselves just by using str.split(','). But how it can be done if the string is the following:

12.0,trs,"xx-xx NY,US"

or sometimes there are quotes or double quotes on all values:

"12.0","trs","xx-xx NY,US"

but sometimes there are also additional space after the comma:

"12.0","trs", "xx-xx NY,US"

I guess I need to use regular expression, but I could not find universal one, that cover all cases. Please help!

  • Please post your attempted code – Logan Murphy Dec 09 '18 at 23:16
  • Just curious. Why trivial and duplicate question "How to split comma separated string using JavaScript? [duplicate]" get almost 200 votes, and mine more complex, still practical and not answered yet got -1. Really interesting.. more than the initial question I have solved already. – Филипп Цветков Dec 14 '18 at 23:27

4 Answers4

1

You can add a regex into split.

var date = "02-25-2010";
var myregexp2 = new RegExp("/(?:[^",]+|"[^"]*")+/g"); 
dateArray = date.split(myregexp2);
Wojtokuba
  • 21
  • 3
1

As the other answers mainly use regex, here you have a naive implementation how you could parse it by looping over the characters of each line. It will clearly not be as efficient as other answers, just a possible way how to deal with your input

const settings = {
  separator: ',',
  wrap: '"',
  ignore: ' '
};

const csv = [
  '12.0,trs,"xx-xx NY,US"',
  '"12.0","trs","xx-xx NY,US"',
  '"12.0","trs", "xx-xx NY,US"',
  '"12.0","trs", "xx-xx ""NY"",US"'
];

function parseLine( line ) {
  let 
    insideWrap = false, 
    isFirst = true;
  return line.split('').reduce( (columns, char) => {
    if (insideWrap) {
      if (char === settings.wrap) {
        insideWrap = false;
        return columns;
      }
    } else {
      if (char === settings.wrap) {
        insideWrap = true;
        if (isFirst) {
          isFirst = true;
          return columns;
        }
      }
      if (char === settings.separator) {
        isFirst = true;
        return columns;
      }
      if (isFirst && char === settings.ignore) {
        return columns;
      }
    }
    if (isFirst) {
      isFirst = false;
      columns.push('');
    }
    const idx = columns.length - 1;
    columns[idx] += char;
    return columns;
  }, []);
}

function getCsvData( lines ) {
  return lines.map( parseLine );
}

console.log( getCsvData( csv ) );
Icepickle
  • 12,689
  • 3
  • 34
  • 48
0

Use match instead of split, and repeatedly match either non-comma, non-" characters, or match "s, followed by non-" characters (thus matching commas inside "s, as desired), followed by another ". Also use negative lookahead for a space at the beginning of the pattern to ensure that the first matched character is not a space:

const translate = str => console.log(
  str.match(/(?! )(?:[^",]+|"[^"]*")+/g)
);

[
  `12.0,trs,"xx-xx NY,US"`,
  `"12.0","trs","xx-xx NY,US"`,
  `"12.0","trs", "xx-xx NY,US"`
].forEach(translate);
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
0

Parsing csv with a regex is not a great idea unless you know ahead of time that the csv is going to conform to very specific limits that you can predict. In practice this is rarely the case, which is why people go through the trouble of making csv parsers. See here for a long discussion.

There are several parsers out there and they are easy to use in general. For example with Papa Parse you can just call parse on the string and stop worrying about the edge cases:

console.log(Papa.parse('12.0,trs,"xx-xx NY,US"').data[0])
console.log(Papa.parse('"12.0","trs","xx-xx NY,US"').data[0])
<script src="https://cdnjs.cloudflare.com/ajax/libs/PapaParse/4.6.2/papaparse.js"></script>
Mark
  • 90,562
  • 7
  • 108
  • 148