6

I'm wanting to match any instance of text in a comma-delimited list. For this, the following regular expression works great:

/[^,]+/g

(Regex101 demo).

The problem is that I'm wanting to ignore any commas which are contained within either single or double quotes and I'm unsure how to extend the above selector to allow me to do that.

Here's an example string:

abcd, efgh, ij"k,l", mnop, 'q,rs't

I'm wanting to either match the five chunks of text or match the four relevant commas (so I can retreive the data using split() instead of match()):

  1. abcd
  2. efgh
  3. ij"k,l"
  4. mnop
  5. 'q,rs't

Or:

abcd, efgh, ij"k,l", mnop, 'q,rs't
    ^     ^        ^     ^

How can I do this?


Three relevant questions exist, but none of them cater for both ' and " in JavaScript:

  1. Regex for splitting a string using space when not surrounded by single or double quotes - Java solution, doesn't appear to work in JavaScript.
  2. A regex to match a comma that isn't surrounded by quotes - Only matches on "
  3. Alternative to regex: match all instances not inside quotes - Only matches on "
Community
  • 1
  • 1
James Donnelly
  • 126,410
  • 34
  • 208
  • 218
  • @WiktorStribiżew that fails when a space is next to a comma within quotes (https://regex101.com/r/cW5hM0/2). – James Donnelly Mar 14 '16 at 14:32
  • @WiktorStribiżew that one regards anything outside of quotes as a different match, regardless of a comma: https://regex101.com/r/cW5hM0/4 (notice the `k` character in that). – James Donnelly Mar 14 '16 at 14:36
  • ([^,]+".*"[^,])+|([^,]?'.*'[^,])+|([^,]+) / g , this should work and capture the groups you want to – SamyQc Mar 14 '16 at 14:49

3 Answers3

3

Okay, so your matching groups can contain:

  • Just letters
  • A matching pair of "
  • A matching pair of '

So this should work:

/((?:[^,"']+|"[^"]*"|'[^']*')+)/g

RegEx101 Demo

As a nice bonus, you can drop extra single-quotes inside the double-quotes, and vice versa. However, you'll probably need a state machine for adding escaped double-quotes inside double quoted strings (eg. "aa\"aa").

Unfortunately it matches the initial space as well - you'll have to the trim the matches.

Gustav Bertram
  • 14,591
  • 3
  • 40
  • 65
2

Using a double lookahead to ascertain matched comma is outside quotes:

/(?=(([^"]*"){2})*[^"]*$)(?=(([^']*'){2})*[^']*$)\s*,\s*/g
  • (?=(([^"]*"){2})*[^"]*$) asserts that there are even number of double quotes ahead of matching comma.
  • (?=(([^']*"){2})*[^']*$) does the same assertion for single quote.

PS: This doesn't handle case of unbalanced, nested or escaped quotes.

RegEx Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
0

Try this in JavaScript

(?:(?:[^,"'\n]*(?:(?:"[^"\n]*")|(?:'[^'\n]*'))[^,"'\n]*)+)|[^,\n]+

Demo

Add group for more readable (remove ?<name> for Javascript)

(?<has_quotes>(?:[^,"'\n]*(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+)|(?<simple>[^,\n]+)

Demo

Explanation:

(?<double_quotes>"[^"\n]*") matches "Any inside but not "" = (1) (in double quote)
(?<single_quotes>'[^'\n]*') matches 'Any inside but not '' = (2) (in single quote)
(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*')) matches (1)or(2) = (3)
[^,"'\n]* matches any text but not "', = (w)
(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*) matches (3)(w)
(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+ matches repeat (3)(w) = (3w+)
(?<has_quotes>[^,"'\n]*(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+) matches (w)(3w+) = (4) (has quotes)
[^,\n]+ matches other case (5) (simple)
So in final we have (4)|(5) (has quote or simple)

Input

abcd,efgh, ijkl
abcd, efgh, ij"k,l", mnop, 'q,rs't
'q, rs't
"'q,rs't, ij"k, l""

Output:

MATCH 1
simple  [0-4]   `abcd`
MATCH 2
simple  [5-9]   `efgh`
MATCH 3
simple  [10-15] ` ijkl`
MATCH 4
simple  [16-20] `abcd`
MATCH 5
simple  [21-26] ` efgh`
MATCH 6
has_quotes  [27-35] ` ij"k,l"`
double_quotes   [30-35] `"k,l"`
MATCH 7
simple  [36-41] ` mnop`
MATCH 8
has_quotes  [42-50] ` 'q,rs't`
single_quotes   [43-49] `'q,rs'`
MATCH 9
has_quotes  [51-59] `'q, rs't`
single_quotes   [51-58] `'q, rs'`
MATCH 10
has_quotes  [60-74] `"'q,rs't, ij"k`
double_quotes   [60-73] `"'q,rs't, ij"`
MATCH 11
has_quotes  [75-79] ` l""`
double_quotes   [77-79] `""`
Tim007
  • 2,557
  • 1
  • 11
  • 20