My first guess is to use a regular expression. You can try this one I've just whipped up (regex101 link):
/\s*(")?(.*?)\1\s*(?:,|$)/gm
This can be used to extract fields, so headers can be grabbed with it as well. The first capture group is used as an optional quote-grabber with a backreference (\1
), so the actual data is in the second capture group.
Here's an example of it in use. I had to use a slice
to cut off the last match in all cases, since allowing for blank fields with the *
wildcard (things like f1,,f3
) put a zero-width match at the end. This was easier to get rid of in-code rather than with some regex trickery. Finally, I've got 'extra_i'
as a default/placeholder value if there are some extra columns not accounted for by the headers. You should probably swap that part out to fit your own needs.
/**
* Takes a raw CSV string and converts it to a JavaScript object.
* @param {string} text The raw CSV string.
* @param {string[]} headers An optional array of headers to use. If none are
* given, they are pulled from the first line of `text`.
* @param {string} quoteChar A character to use as the encapsulating character.
* @param {string} delimiter A character to use between columns.
* @returns {object[]} An array of JavaScript objects containing headers as keys
* and row entries as values.
*/
function csvToJson(text, headers, quoteChar = '"', delimiter = ',') {
const regex = new RegExp(`\\s*(${quoteChar})?(.*?)\\1\\s*(?:${delimiter}|$)`, 'gs');
const match = line => [...line.matchAll(regex)]
.map(m => m[2]) // we only want the second capture group
.slice(0, -1); // cut off blank match at the end
const lines = text.split('\n');
const heads = headers ?? match(lines.shift());
return lines.map(line => {
return match(line).reduce((acc, cur, i) => {
// Attempt to parse as a number; replace blank matches with `null`
const val = cur.length <= 0 ? null : Number(cur) || cur;
const key = heads[i] ?? `extra_${i}`;
return { ...acc, [key]: val };
}, {});
});
}
const testString = `name,age,quote
John,,Hello World
Mary,23,""Alas, What Can I do?""
Joseph,45,"Waiting, waiting, waiting"
"Donaldson Jones" , sixteen, ""Hello, "my" friend!""`;
console.log(csvToJson(testString));
console.log(csvToJson(testString, ['foo', 'bar', 'baz']));
console.log(csvToJson(testString, ['col_0']));
As a bonus, I've written this to allow for the passing of a list of strings to use as the headers instead, since I know first hand that not all CSV files have those.
Note: This regex approach does not work if your values have new-lines in them. This is because it relies on splitting the string at the newlines. I did look into using this regular expression to split the lines only at newlines outside of quotes, which almost worked, but took upwards of 30 seconds on anything longer than a few lines.
If you want to get full functionality, your best bet would be to find an existing parsing library, or to write your own: one that counts occurrences of quotes to figure out if you're inside or outside a "cell" at the moment as you iterate through them.