-3

I have two elements in an array.

"3a49c9bb-caaf-48d8-b77c-45161cdf8ff5,Stadtwerke Feldkirch,AT006110,\"Verteilernetzbetreiber,Strom\",http://www.stadtwerke-feldkirch.at ,,,2023-02-13T16:05:03.452Z,2023-02-13T16:05:03.452Z\r"

"a601f77b-c7ed-40a8-a639-9baa25ca28a5,Revertera'sches Elektrizitätswerk,AT003540,\"Verteilernetzbetreiber,Strom\",http://www.revertera.at ,,,2023-02-13T16:05:03.492Z,2023-02-13T16:05:03.492Z\r"

The first element in an valid CSV string but the second is not, and I cannot figure it out why.

I use this function to convert an string to array:

const data = ["3a49c9bb-caaf-48d8-b77c-45161cdf8ff5,Stadtwerke Feldkirch,AT006110,\"Verteilernetzbetreiber,Strom\",http://www.stadtwerke-feldkirch.at ,,,2023-02-13T16:05:03.452Z,2023-02-13T16:05:03.452Z\r", "a601f77b-c7ed-40a8-a639-9baa25ca28a5,Revertera'sches Elektrizitätswerk,AT003540,\"Verteilernetzbetreiber,Strom\",http://www.revertera.at ,,,2023-02-13T16:05:03.492Z,2023-02-13T16:05:03.492Z\r"];

function CSVtoArray(text) {
    var re_valid = /^\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*(?:,\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*)*$/;
    var re_value = /(?!\s*$)\s*(?:'([^'\\]*(?:\\[\S\s][^'\\]*)*)'|"([^"\\]*(?:\\[\S\s][^"\\]*)*)"|([^,'"\s\\]*(?:\s+[^,'"\s\\]+)*))\s*(?:,|$)/g;
    // Return NULL if input string is not well formed CSV string.
    if (!re_valid.test(text)) return null
    var a = []; // Initialize array to receive values.
    text.replace(re_value, // "Walk" the string using replace with callback.
        function(m0, m1, m2, m3) {
            // Remove backslash from \' in single quoted values.
            if (m1 !== undefined) a.push(m1.replace(/\\'/g, "'"));
            // Remove backslash from \" in double quoted values.
            else if (m2 !== undefined) a.push(m2.replace(/\\"/g, '"'));
            else if (m3 !== undefined) a.push(m3);
            return ''; // Return empty string.
        });
    // Handle special case of empty last value.
    if (/,\s*$/.test(text)) a.push('');
    return a;
}

console.log(data.map(CSVtoArray));

I got this function from How can I parse a CSV string with JavaScript, which contains comma in data?

I have 113 Elements and this is the only one that returns null and I dont know why

A minimal example:

const re_valid = /^\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*(?:,\s*(?:'[^'\\]*(?:\\[\S\s][^'\\]*)*'|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,'"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*)*$/;
const text1 = "3a49c9bb-caaf-48d8-b77c-45161cdf8ff5,Stadtwerke Feldkirch,AT006110,\"Verteilernetzbetreiber,Strom\",http://www.stadtwerke-feldkirch.at ,,,2023-02-13T16:05:03.452Z,2023-02-13T16:05:03.452Z\r";
const text2 = "a601f77b-c7ed-40a8-a639-9baa25ca28a5,Revertera'sches Elektrizitätswerk,AT003540,\"Verteilernetzbetreiber,Strom\",http://www.revertera.at ,,,2023-02-13T16:05:03.492Z,2023-02-13T16:05:03.492Z\r";
const text3 = "a601f77b-c7ed-40a8-a639-9baa25ca28a5,Reverterasches Elektrizitätswerk,AT003540,\"Verteilernetzbetreiber,Strom\",http://www.revertera.at ,,,2023-02-13T16:05:03.492Z,2023-02-13T16:05:03.492Z\r";

console.log(re_valid.test(text1));
console.log(re_valid.test(text2));
console.log(re_valid.test(text3));
jabaa
  • 5,844
  • 3
  • 9
  • 30
bill.gates
  • 14,145
  • 3
  • 19
  • 47
  • So actually you're asking why `re_valid.test(text)` returns `false`? That's the only place where the function could return `null`? Have you confirmed it using your debugger? – jabaa Mar 24 '23 at 12:35
  • @jabaa I know that its invalid, the question is why – bill.gates Mar 24 '23 at 12:46
  • 1
    The problem is the single quote (`'`) in `Revertera'sches`. The regular expression doesn't allow it. – jabaa Mar 24 '23 at 12:56

1 Answers1

1

The regular expression considers single-quoted strings and double-quoted strings, but according to your example texts, you only want to consider double-quoted strings. You have to modify the regular expression.

const re_valid = /^\s*(?:|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,"\s\\]*(?:\s+[^,"\s\\]+)*)\s*(?:,\s*(?:|"[^"\\]*(?:\\[\S\s][^"\\]*)*"|[^,"\s\\]*(?:\s+[^,'"\s\\]+)*)\s*)*$/;
const text1 = "3a49c9bb-caaf-48d8-b77c-45161cdf8ff5,Stadtwerke Feldkirch,AT006110,\"Verteilernetzbetreiber,Strom\",http://www.stadtwerke-feldkirch.at ,,,2023-02-13T16:05:03.452Z,2023-02-13T16:05:03.452Z\r";
const text2 = "a601f77b-c7ed-40a8-a639-9baa25ca28a5,Revertera'sches Elektrizitätswerk,AT003540,\"Verteilernetzbetreiber,Strom\",http://www.revertera.at ,,,2023-02-13T16:05:03.492Z,2023-02-13T16:05:03.492Z\r";

console.log(re_valid.test(text1));
console.log(re_valid.test(text2));
jabaa
  • 5,844
  • 3
  • 9
  • 30
  • This works thank you. However, some CSV have an `;` delimiter, is it possible to check if delimiter is `,` or `;` – bill.gates Mar 24 '23 at 13:22
  • @bill.gates I guess it's possible. You probably can modify the regular expression. But I'm not sure. But if you're sure that the input data is always valid CSV, you can remove this validation. – jabaa Mar 24 '23 at 13:31