1

I have the following regex:

const splitRegex = new RegExp('(".*?"|[^",]+)(?=\\s*,|\\s*$)', 'g');
const row = line.match(splitRegex);

to extract strings in quotes in this but ignoring commas, however, it doesn't work with string like: 6000.1, Basic "Internet" abc, 101, NO_VLAN which will return just ["6000.1", "abc", "101", "NO_VLAN"]

I tried adding regex for "a word followed by a space" ([^\s]*)(?=\s*) at the beginning and the end of the original one but it looks even worse... [ "6000.1,", "\"Internet\" abc,", " 101,", " NO_VLAN" ]

What I would like is ["6000.1", "Basic \"Internet\" abc", "101", "NO_VLAN"] or ["6000.1", "Basic \"Internet\"", "101", "NO_VLAN"] if the string is 6000.1, Basic "Internet", 101, NO_VLAN

Thank you.

lmngn23
  • 511
  • 3
  • 15

2 Answers2

1

You can use

text.match(/(?=\S)(?:"[^"]*"|[^",])+/g)

Or, if you need to include escape sequences:

text.match(/(?=\S)(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|[^",])+/g)

See the regex demo #1 and regex demo #2.

Details

  • (?=\S) - next char must be a non-whitespace char
  • (?:"[^"]*"|[^",])+ - one or more occurrences (+) of the pattern sequence defined in a non-capturing group ((?:...)):
    • "[^"]*" - either ", then 0 or more chars other than " and then a "
    • | - or
    • [^",] - any char other than " and ,.

JavaScript demo:

const text = String.raw`6000.1, Basic "Internet \"text\"" abc, 101, NO_VLAN`;
console.log(text.match(/(?=\S)(?:"[^"\\]*(?:\\[\s\S][^"\\]*)*"|[^",])+/g));
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you! For the 1st one, why can't it be `(?:\S"[^"]*"|[^",])+` ? – lmngn23 Aug 24 '20 at 21:43
  • @user545871 It would require a non-whitespace char and then a double quote substring in the first alternative, it will overmatch, or undermatch. We only want the first char of a match to be a non-whitespace char in order to avoid trimming the results later. – Wiktor Stribiżew Aug 24 '20 at 22:13
0

This regex works with your example ((?=\S)[^,]+):

const regex = /((?=\S)[^,]+)/gm;
const str = `6000.1, Basic "Internet" abc, 101, NO_VLAN`;

console.log(str.match(regex));
Charlesthk
  • 9,394
  • 5
  • 43
  • 45