Im trying to parse a single of a csv file. Curently it is done with some online regex webpage but in the end it has to be implemented in c#. (as reaction of some question in the comments)
I read a lot of other articels here on SO to figure it out by myself, but im stuck in solving it.
My test line for my RegExp looks like this (UPDATE: quotes escaped inside of quoted-strings):
;;"test123;weiterer Text";;"Test mit " Zeichen im Spaltenwert";nächste Spalte mit " Begrenzungszeichen;"4711";irgendwas 123,4;1222;"foo"test"
;;"test123;weiterer Text";;"Test mit "" Zeichen im Spaltenwert";nächste Spalte mit "" Begrenzungszeichen;"4711";irgendwas 123,4;1222;"foo""test"
- ; is the delimiter
- " is the sign for quoted columns
Problem:
- the line may contain empty columns (semicolon followed by semicolon without any text)
- quoted strings may contain the quote sign, like here "Test mit " Zeichen im Spaltenwert"
- the column delimiter may occure also in quoted strings, like here: "test123;weiterer Text"
What i have done so far with several googling and my limited understanding of regular expressions is this expression
(?<=^|;)(\".\"|[^;]*)|[^;]+
This gives following result
[0] =>
[1] =>
[2] => "test123
[3] => weiterer Text"
[4] =>
[5] => "Test mit " Zeichen im Spaltenwert"
[6] => nächste Spalte mit " Begrenzungszeichen
[7] => "4711"
[8] => irgendwas 123,4
[9] => 1222
[10] => "foo"test"
Tested with https://www.myregextester.com/
The problem i have now is at the elements 2 and 3. This text
"test123;weiterer Text"
has to be one column but gets splited at the semicolon inside of the quoted string, although i thought i told the expression to match everysthing inside of quotation marks.
Any help here is highly appreciated. Thanks in advance.