I wrote this regex
val regex = """(?<=,|^)(((?:")([^"]*)(?:"))([^,]*))""".r
If I give an input line of
val input = "\"FOO,BAR\",\"10,1\",12,This is Test,X,X"
Now if I do
regex.findAllIn(input).matchData.foreach(println)
I can see
"FOO,BAR"
"10,1"
12
This is Test
X
X
My question is that in the Regex Above. I had clearly put the "
in the non capturing Group by doing (?:")
So the output token should have been FOO,BAR
and not "FOO,BAR"
.
Why didn't the non-capture group work as expected?
Edit: Based on one of the comment below that non-capture group are still being matched and consumed. I tried to rewrite the expression as
@ val regex = """(?<=,|^)(((?<=")([^"]*)(?="))|([^,]*))""".r
but now it breaks altogether because the first part of the OR expression never matches and the output is
"FOO
BAR"
"10
1"
12
This is Test
X
X
So now its only matching the second condition of [^,]*
I also googled and found this thread
Parsing CSV input with a RegEx in java
But the accepted answer has the same problem as the one I have above
What I want to see as the output of the expression is
FOO,BAR
10,1
12
This is Test
X
X