2

I have the following regex to count the number of commas contained in a string, where the comma does not appear between quotes:

(?!\B"[^"]*),(?![^"]*"\B)

For the following string, I would get 4 returned:

a,b,c,"my my, a bit of text",e

However, if the start of the quoted string is a special character, it fails to count anything prior to the end of the quoted string:

a,b,c,"[20200624T013030 Umognog Wrote] my my, a bit of text",e

This will return just 1

I am trying to alter it to determine all 4 occurrences in the second example, whilst not altering the results from the first example but failing miserably at the first hurdle! Regex is not my kung fu :(

I know I need to escape the '[' but I really cannot figure this out and are grasping in the dark.

Umognog
  • 23
  • 2

1 Answers1

3

If the double quotes are balanced, one option is to assert what is on the right from the current position are balanced double quotes.

The newline in the character class is to prevent crossing linebreaks. You could also add \r if that it necessary.

,(?=(?:[^"\n]*"[^"\n]*")*[^"\n]*$)

Explanation

  • , Match a comma
  • (?= Positive lookahead, assert what is on the right is
    • (?:[^"\n]*"[^"\n]*")* Match 0+ times pairs of double quotes
    • [^"\n]* Match 0+ times any char except a double quote
  • $ Assert end of string
  • ) Close lookahead

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70