1

Is there a way to split a CSV string with javascript where the separator can also occur as an escaped value. Other regex implementations solve this problem with a lookbehind, but since javascript does not support lookbehind I wonder how I could accomplish this in a neatly fashion using a regex expression.

A csv line might look like this

"This is\, a value",Hello,4,'This is also\, possible',true

This must be split into (strings containing)

[0] => "This is\, a value"
[1] => Hello
[2] => 4
[3] => 'This is also\, possible'
[4] => true
Rudolf
  • 107
  • 1
  • 10

4 Answers4

1

Instead of trying to split you can try a global match for all that is not a , with this pattern:

/"[^"]+"|'[^']+'|[^,]+/g
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Thank you Casimir. This will not work when you want to split the CSV line. But you are absolutely right, it does give the expected output. I could group this expression to get the result, but the end result will also include the comma. Personally I do find this solution a little bit better than (.*?[^\\])(,|$) since the result excludes the comma. – Rudolf Oct 10 '13 at 08:22
0

for example you can use this regex:

(.*?[^\\])(,|$)

regex takes everything .*? until first comma, which does not have \ in front of it, or end of line

Darka
  • 2,762
  • 1
  • 14
  • 31
  • Just a suggestion, if you add an explanation of what this regex does it will be more easier for OP to understand and use. – Harry Oct 06 '13 at 11:26
  • will do next time too. Thanks @Harry – Darka Oct 06 '13 at 11:29
  • Yes, this works, but the result does include the comma. If I split the CSV line with this regex I'm pretty close to the result. – Rudolf Oct 10 '13 at 08:20
0

Here's some code that changes csv to json (assuming the first row it prop names). You can take the first part (array2d) and do other things with it very easily.

// split rows by \r\n.  Not sure if all csv has this, but mine did
const rows = rawCsvFile.split("\r\n");

// find all commas, or chunks of text in quotes.  If not in quotes, consider it a split point
const splitPointsRegex = /"(""|[^"])+?"|,/g;
const array2d = rows.map((row) => {
    let lastPoint = 0;
    const cols: string[] = [];
    let match: RegExpExecArray;
    while ((match = splitPointsRegex.exec(row)) !== null) {
        if (match[0] === ",") {
            cols.push(row.substring(lastPoint, match.index));
            lastPoint = match.index + 1;
        }
    }
    cols.push(row.slice(lastPoint));

    // remove leading commas, wrapping quotes, and unneeded \r
    return cols.map((datum) => 
        datum.replace(/^,?"?|"$/g, "")
        .replace(/""/g, `\"`)
        .replace(/\r/g, "")
    );
})

// assuming first row it props name, create an array of objects with prop names of the values given
const out = [];
const propsRow = array2d[0];
array2d.forEach((row, i) => {
    if (i === 0) { return; }
    const addMe: any = {};
    row.forEach((datum, j) => {
        let parsedData: any;
        if (isNaN(Number(datum)) === false) {
            parsedData = Number(datum);
        } else if (datum === "TRUE") {
            parsedData = true;
        } else if (datum === "FALSE") {
            parsedData = false;
        } else {
            parsedData = datum;
        }
        addMe[propsRow[j]] = parsedData;
    });
    out.push(addMe);
});

console.log(out);
Seph Reed
  • 8,797
  • 11
  • 60
  • 125
0

Unfortunately this doesn't work with Firefox, only in Chrome and Edge:

"abc\\,cde,efg".split(/(?<!\\),/) will result in ["abc\,cde", "efg"].

You will need to remove all (unescaped) escapes in a second step.

Brixomatic
  • 381
  • 4
  • 16