0

This is my regex:

var separator = '|';
Regex csvSplit = new Regex("(?:^|" + separator + ")(\"(?:[^\"]+|\"\")*\"|[^" + separator + "]+)", RegexOptions.Compiled);
var test = csvSplit.Matches("10734|Vls, p|6||1.5");

As you can see, there is one empty record.

This is what I get: enter image description here

I was expecting empty string on index 3, but instead it is skipped. what am I doing wrong?

petko_stankoski
  • 10,459
  • 41
  • 127
  • 231
  • 3
    Why don't you do it right from the start [using the CSV parser](http://stackoverflow.com/questions/6542996/how-to-split-csv-whose-columns-may-contain/6543418#6543418)? – Wiktor Stribiżew Jul 30 '18 at 08:41
  • `.Split(new [] { '|' }, StringSplitOptions.RemoveEmptyEntries)` Is regex mandatory ? or the basic [Split](https://msdn.microsoft.com/en-us/library/tabh47cf(v=vs.110).aspx) is engouth – Drag and Drop Jul 30 '18 at 08:43
  • Related https://stackoverflow.com/questions/7393119/c-splitting-a-string-and-not-returning-empty-string – Drag and Drop Jul 30 '18 at 08:44
  • @DragandDrop It is mandatory, I don't want a completely new solution, I just want to modify the regex. – petko_stankoski Jul 30 '18 at 08:45
  • Ok, then why don't you escape the separator in the first place? `|` is a special char, an alternation operator. Your pattern looks like `(?:^||)("(?:[^"]+|"")*"|[^|]+)` and `(?:^||)` is really fishy, did you mean `(?:^|\|)`? – Wiktor Stribiżew Jul 30 '18 at 09:02
  • Can you have a string like `|10734||1.5|"aa""bb"aa` (5 items in the output expected)? Or only strings like `|10734||1.5|"aa""bb"` are expected? – Wiktor Stribiżew Jul 30 '18 at 09:16
  • @WiktorStribiżew I can have both. – petko_stankoski Jul 30 '18 at 09:34
  • @Good, then I know what regex will work for you. But there is another answer posted, I will wait till they come up with their proposition. – Wiktor Stribiżew Jul 30 '18 at 09:35

1 Answers1

1

Try this regex instead:

(?:^|(?<=\|))((?:"[^"]*"|[^|])*)(?=\||$)
blhsing
  • 91,368
  • 6
  • 71
  • 106
  • Your pattern ignores the fact that `|` inside a verbatim string literal should not be split upon. See `(\"(?:[^\"]+|\"\")*\"` in the original regex. Thus, it won't work with `|10734||1.5|"aa""bb|cc"` – Wiktor Stribiżew Jul 30 '18 at 09:26
  • @WiktorStribiżew Good point. I've updated my answer accordingly. Thanks. – blhsing Jul 30 '18 at 09:42
  • No, it does not work, try against [`|10734||1.5|"aa|""b|b"|"aa""bb|"aa|`](http://regexstorm.net/tester?p=%28%3f%3a%5e%7c%5c%7c%29%28%28%3f%3a%22%5b%5e%22%5d*%22%7c%5b%5e%7c%5d%29*%29%28%3f%3d%5c%7c%7c%24%29&i=%7c10734%7c%7c1.5%7c%22aa%7c%22%22b%7cb%22%7c%22aa%22%22bb%7c%22aa%7c) – Wiktor Stribiżew Jul 30 '18 at 09:43
  • But [it still missed to match the empty string at the start](http://regexstorm.net/tester?p=%28%3f%3a%5c%7c%7c%5e%29%28%28%3f%3a%22%5b%5e%22%5d*%22%7c%5b%5e%7c%5d%29*%29%28%3f%3d%5c%7c%7c%24%29&i=%7c10734%7c%7c1.5%7c%22aa%7c%22%22b%7cb%22%7c%22aa%22%22bb%7c%22aa%7c). – Wiktor Stribiżew Jul 30 '18 at 09:51
  • Indeed. Fixed now. Thanks. – blhsing Jul 30 '18 at 10:08