Update:
I think the final version in a line should be:
var cells = (rows[i] + ',').split(/(?: *?([^",]+?) *?,|" *?(.+?)" *?,|( *?),)/).slice(1).reduce((a, b) => (a.length > 0 && a[a.length - 1].length < 4) ? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]] : [...a, [b]], []).map(e => e.reduce((a, b) => a !== undefined ? a : b, undefined))
or put it more beautifully:
var cells = (rows[i] + ',')
.split(/(?: *?([^",]+?) *?,|" *?(.+?)" *?,|( *?),)/)
.slice(1)
.reduce(
(a, b) => (a.length > 0 && a[a.length - 1].length < 4)
? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]]
: [...a, [b]],
[],
)
.map(
e => e.reduce(
(a, b) => a !== undefined ? a : b, undefined,
),
)
;
This is rather long, but still looks purely functional. Let me explain it:
First, the regular expression part. Basically, a segment you want may fall into 3 possibilities:
*?([^",]+?) *?,
, which is a string without "
or ,
surrounded with spaces, followed by a ,
.
" *?(.+?)" *?,
, which is a string, surrounded with a pair of quotes and an indefinite number of spaces beyond the quotes, followed by a ,
.
( *?),
, which is an indefinite number of spaces, followed by a ','.
So splitting by a non-capturing group of a union of these three will basically get us to the answer.
Recall that when splitting with a regular expression, the resulting array consists of:
- Strings separated by the separator (the regular expression)
- All the capturing groups in the separator
In our case, the separators fill the whole string, so the strings separated are all empty strings, except that last desired part, which is left out because there is no ,
following it. Thus the resulting array should be like:
- An empty string
- Three strings, representing the three capturing groups of the first separator matched
- An empty string
- Three strings, representing the three capturing groups of the second separator matched
- ...
- An empty string
- The last desired part, left alone
So why simply adding a ,
at the end so that we can get a perfect pattern? This is how (rows[i] + ',')
comes about.
In this case the resulting array becomes capturing groups separated by empty strings. Removing the first empty string, they will appear in a group of 4 as [ 1st capturing group, 2nd capturing group, 3rd capturing group, empty string ].
What the reduce
block does is exactly grouping them into groups of 4:
.reduce(
(a, b) => (a.length > 0 && a[a.length - 1].length < 4)
? [...a.slice(0, a.length - 1), [...a[a.length - 1], b]]
: [...a, [b]],
[],
)
And finally, find the first non-undefined
elements (an unmatched capturing group will appear as undefined
. Our three patterns are exclusive in that any 2 of them cannot be matched simultaneously. So there is exactly 1 such element in each group) in each group which are precisely the desired parts:
.map(
e => e.reduce(
(a, b) => a !== undefined ? a : b, undefined,
),
)
This completes the solution.
I think the following should suffice:
var cells = rows[i].split(/([^",]+?|".+?") *, */).filter(e => e)
or if you don't want the quotes:
var cells = rows[i].split(/(?:([^",]+?)|"(.+?)") *, */).filter(e => e)