26

I'm checking line by line in C#

Example data:

bob jones,123,55.6,,,"Hello , World",,0
jim neighbor,432,66.5,,,Andy "Blank,,1
john smith,555,77.4,,,Some value,,2

Regex to pick commas outside of quotes doesn't resolve second line, it's the closest.

Dale K
  • 25,246
  • 15
  • 42
  • 71
Chris Hayes
  • 3,876
  • 7
  • 42
  • 72

5 Answers5

60

Try the following regex:

(?!\B"[^"]*),(?![^"]*"\B)


Here is a demonstration:

regex101 demo


  • It does not match the second line because the " you inserted does not have a closing quotation mark.
  • It will not match values like so: ,r"a string",10 because the letter on the edge of the " will create a word boundary, rather than a non-word boundary.

Alternative version

(".*?,.*?"|.*?(?:,|$))

This will match the content and the commas and is compatible with values that are full of punctuation marks

regex101 demo

Vasili Syrakis
  • 9,321
  • 1
  • 39
  • 56
2

The below regex is for parsing each fields in a line, not an entire line

Apply the methodical and desperate regex technique: Divide and conquer

Case: field does not contain a quote

  • abc,
  • abc(end of line)

[^,"]*(,|$)

Case: field contains exactly two quotes

  • abc"abc,"abc,
  • abc"abc,"abc(end of line)

[^,"]*"[^"]*"[^,"]*(,|$)

Case: field contains exactly one quote

  • abc"abc(end of line)
  • abc"abc, (and that there's no quote before the end of this line)

[^,"]*"[^,"]$

[^,"]*"[^"],(?!.*")

Now that we have all the cases, we then '|' everything together and enjoy the resultant monstrosity.

Community
  • 1
  • 1
twj
  • 759
  • 5
  • 12
1

The best answer written by Vasili Syrakis does not work with negative numbers inside quotation marks such as:

bob jones,123,"-55.6",,,"Hello , World",,0
jim neighbor,432,66.5

Following regex works for this purpose:

,(?!(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$))

But I was not successful with this part of input:

,Andy "Blank,
StilesCrisis
  • 15,972
  • 4
  • 39
  • 62
kitnarf
  • 11
  • 1
0

try this pattern ".*?"(*SKIP)(*FAIL)|, Demo

alpha bravo
  • 7,838
  • 1
  • 19
  • 23
0
import re

print re.sub(',(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)',"",string)
Michał Perłakowski
  • 88,409
  • 26
  • 156
  • 177
Nithin
  • 171
  • 1
  • 3