1

I would like to have a CSV parser. However, I would like to have the following: the regex must check, that the CSV ends with a comma , (otherwise the regex must not consider the CSV as valid)! If CSV is OK, then I will extract the text between ,.

Example:

hello,world,end, //OK. CSV ends with ",". There are 3 matches: 'hello' 'world' 'end'

aa,bb,cc //NOT ok. CSV doesn't end with ",". No matches.

I have a question to the regex suggested by @Denomales. If the CSV contains quoted parts, there is always a match that contains only a quotation mark. Is it possible to avoid it?

Community
  • 1
  • 1
deemon
  • 133
  • 1
  • 6
  • Please check my updated answer, where I modified [your regex](http://stackoverflow.com/a/30416711/3832970). – Wiktor Stribiżew May 23 '15 at 20:19
  • You should have included your regex here, you would not have a down vote. – Wiktor Stribiżew May 23 '15 at 20:22
  • basic regex `(?=.*,$)([^,]+)(?:,)` – Miguel Mota May 23 '15 at 20:25
  • You are not doing anything relative to CSV, why is it in the question topic? `Does string end with a comma` should be the topic. For that its `,$`. What does CSV have to do with it ?? –  May 23 '15 at 20:38
  • What language do you use? Could you add pertinent examples (in particular with quotes)? – Casimir et Hippolyte May 23 '15 at 20:51
  • You can _split_ csv, but if the values can be quoted it throws everything off. –  May 23 '15 at 21:26
  • 1
    The simplest way to parse dbl quoted csv: `"` is escape char `(?:^|,)\s*("[^"]*(?:""[^"]*)*"|[^,]*?)\s*(?=,|$)` or '\' is escape `(?:^|,)\s*("[^"\\]*(?:\\.[^"\\]*)*"|[^,]*?)\s*(?=,|$)`. There are many more options that can be done. But _split_ on delimiter is not recommended. –  May 23 '15 at 21:42

2 Answers2

0

Speaking about the regex suggested by @Denomales, there can be something we can do about it.

To make sure we only match a string with a comma at the end, you can add a positive look-ahead at the beginning of the pattern (marked with ^ below):

(?=.*,$)(?:^|,)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)
^^^^^^^^       | 

And if you do not want to capture the entry-delimiting mark (the quotes), you can just remove the (?=[^"]|(")?) look-ahead (marked with | above).

See demo

UPDATE

I see you have posted 2 answers to the above mentioned thread. Your regex is almost all you need, just add the look-ahead and a means to skip escaped entities:

(?=.*,$)(?:"((?:\\.|[^"])*)"|([^,]*))(?:[,])

See Demo 2

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Fails at `hello,"world",end`, `hello,"worl\"d",end,` and `hello,worl"d,end,` – Havenard May 23 '15 at 20:03
  • @Havenard: I think it should not match `hello,"world",end`. Did you read the question? The regex is taken from the [above mentioned SO post](http://stackoverflow.com/a/30362695/2604243) and adjusted according to the requirements. – Wiktor Stribiżew May 23 '15 at 20:05
  • @Havenard: Again, please read the question, OP does not mention the kind of CSV there is to parse, OP is interested in the regex from a specific SO post. – Wiktor Stribiżew May 23 '15 at 20:10
  • @Havenard: It appears that OP already solved the issues you mention above, just forgot to post those efforts here. I updated the regex and now it matches the strings you tested. Please check Demo 2. – Wiktor Stribiżew May 23 '15 at 20:29
  • 1
    `(?=[^"]|"?)` is an always true assertion. – Casimir et Hippolyte May 23 '15 at 20:47
  • @CasimiretHippolyte: Good catch :) Updated. – Wiktor Stribiżew May 23 '15 at 20:51
  • 1
    My mistake, in Demonales pattern it is: `(?=[^"]|(")?)` the goal was to define the capture group 1 (needed for the following conditional test). – Casimir et Hippolyte May 23 '15 at 20:58
  • Your first regex has a hole in it. Its possible to drop malformed field data. `[^"]*|[^,"]*` never lets malformed field with dbl quotes in it. –  May 23 '15 at 21:56
0

Thanx @stribizhev, it works perfect.
If i understand correct, the comma could also be escaped:
(?=.*,$)(?:"((?:\\.|[^"])*)"|((?:\\.|[^,])*))(?:[,])
Just for the completeness:-)

deemon
  • 133
  • 1
  • 6