0

I wrote this regex

val regex = """(?<=,|^)(((?:")([^"]*)(?:"))([^,]*))""".r

If I give an input line of

val input = "\"FOO,BAR\",\"10,1\",12,This is Test,X,X"

Now if I do

regex.findAllIn(input).matchData.foreach(println)

I can see

"FOO,BAR"
"10,1"
12
This is Test
X
X

My question is that in the Regex Above. I had clearly put the " in the non capturing Group by doing (?:") So the output token should have been FOO,BAR and not "FOO,BAR".

Why didn't the non-capture group work as expected?

Edit: Based on one of the comment below that non-capture group are still being matched and consumed. I tried to rewrite the expression as

@ val regex = """(?<=,|^)(((?<=")([^"]*)(?="))|([^,]*))""".r

but now it breaks altogether because the first part of the OR expression never matches and the output is

"FOO
BAR"
"10
1"
12
This is Test
X
X

So now its only matching the second condition of [^,]*

I also googled and found this thread

Parsing CSV input with a RegEx in java

But the accepted answer has the same problem as the one I have above

What I want to see as the output of the expression is

FOO,BAR
10,1
12
This is Test
X
X
Knows Not Much
  • 30,395
  • 60
  • 197
  • 373

1 Answers1

1

This is a bit convoluted, but it appears to work.

val regex = """(?<=,|^)("([^"]*)"|([^,]*))""".r
val input = "\"FOO,BAR\",\"10,1\",12,This is Test,X,X"

regex.findAllMatchIn(input).map{m => 
  Option(m.group(2)) getOrElse m.group(0)
}.foreach(println)

I have to agree that Regex is not well suited for CSV parsing.

jwvh
  • 50,871
  • 7
  • 38
  • 64