0

I am trying to understand how the below regular expression is working.

 ,(?=([^\"]\"[^\"]\")[^\"]$) 

Expression matches all the commas that are not enclosed in quotes.

For example:

data,data,data",with"quote,data",with"quote,data

will create following tokens

data,
data,
data",with"quote,
data",with"quote,
data

if i remove $ at the end of the regex expression, it matches all commas(including commas withing quotes). Will be helpful if someone can provide details how this is matching all commas if we remove the $ from the end of regex expression.

Will be helpful if someone can provide information why removing $ from the end of regex matches all commas.

user3053015
  • 43
  • 1
  • 7
  • Does this answer your question? [Regex to split a CSV](https://stackoverflow.com/questions/18144431/regex-to-split-a-csv) – Robert Harvey Jan 20 '20 at 01:25
  • Thank you Robert. still not clear how the regex match all commas(including commas enclosed in quotes), if $ sign is removed from the end. – user3053015 Jan 20 '20 at 01:33
  • 1
    Parsing CSV files is not a trivial operation, and it's hard to get it right. If I were doing it myself, I would not rely on regex; I would find [a first-class CSV parser](https://commons.apache.org/proper/commons-csv/apidocs/org/apache/commons/csv/CSVParser.html) to do the heavy lifting for me. – Robert Harvey Jan 20 '20 at 01:37
  • will be helpful if I can get some information, how all commas are getting matched after removing $ sign. – user3053015 Jan 20 '20 at 01:44
  • The regex provided in your question doesn't work at all. https://regex101.com/r/vRVNff/1 – Robert Harvey Jan 20 '20 at 01:49
  • regex I am using is
     ,(?=([^\"]*\"[^\"]*\")*[^\"]*$) 
    – user3053015 Jan 20 '20 at 01:53
  • OK. https://regex101.com/r/vRVNff/2 with the `$`, https://regex101.com/r/vRVNff/3 without. – Robert Harvey Jan 20 '20 at 02:01
  • 1
    @user3053015 the regex is looking for and even number (0, 2, 4, ...) of quotes between the comma and the end of line, so if the regex matches, then (as long as quotes are not nested) the comma is not within quotes. If you remove the `$`, then the regex will match on any even number of quotes (including 0) after the comma up to end of line **or another quote**, so it will match all commas since it can now also match an odd number of quotes between the comma and end of line – Nick Jan 20 '20 at 02:04
  • Thank you Nick, so $ sign is to force the regex to match even number of quotes till the end of line. – user3053015 Jan 20 '20 at 02:30
  • @user3053015 exactly. – Nick Jan 20 '20 at 03:57

0 Answers0