1

How to change the following regex:

(?:(^|,)(?<quote>"|)(?<value>.*?)(\k<quote>)(?=(,|$)))

which works with: 1,1,-1 ... I get "1","1","-1"

and works with: "1","1","-1" ... I get "1","1","-1"


but it doesn't work as expected when one or more or the substrings are empty:

,1,-1 ...in such case I need to get: "", "1", "1"

,"1","-1" ...in such case I need to get: "", "1", "1"

,"1", ...in such case I need to get: "", "1", ""

,, ...in such case I need to get: "","",""

Is that possible?

Daniel Dušek
  • 13,683
  • 5
  • 36
  • 51
  • 1
    Try `(?<=,|^)(?"?)(?.*?)\k(?=,|$)` – Wiktor Stribiżew Jul 25 '22 at 13:20
  • 3
    If you're trying to parse CSV files, you might want to consider [using a CSV parser](https://stackoverflow.com/q/2081418/87698) instead of a regular expression. There are a lot of edge cases (values contain doubled quotes, value contain line breaks, etc.), and you want to avoid re-inventing the wheel. – Heinzi Jul 25 '22 at 13:25

1 Answers1

1

You can use

(?<=,|^)(?<quote>"?)(?<value>.*?)\k<quote>(?=,|$)

See the regex demo.

Details:

  • (?<=,|^) - start of string or a location right after a comma
  • (?<quote>"?) - an optional double quote captured into Group "quote"
  • (?<value>.*?) - Group "value": any zero or more chars other than line break chars as few as possible
  • \k<quote> - same char as in Group "quote"
  • (?=,|$) - a location right before a comma or end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563