0

For a project, i need to separate string values into an list of strings. The strings are build as following:

string unprocessed = "\"foo,bar\",\"foobar\",\"shizzle ma nizzle\"";

i want to get int into an array like the following:

string[] processed = [] { "\"foo,bar\"", "\"foobar\"", "\"shizzle ma nizzle\""};

For this, im using a regex match system, that separates the code on the "," character combination. The code i have so far is as following:

Regex reg = new Regex(@"((?!(,""|"",)).)+");
string regmatch = "\"\"wubba,lubba\",\"dup dub\"\"";
var matches =  reg.Matches(regmatch);

Assert.AreEqual(2, matches.Count);
Assert.AreEqual("\"dup dub\"\"", matches[1].Value); // passes
Assert.AreEqual("\"\"wubba,lubba\"", matches[0].Value); // fails because value = \"\"wubba,lubba

So far im getting one slight error, as seen in the example code. Right now i'm thinging I'm almost there. Can someone help me solve this regex issue? or is there a better way to do this?

martijn
  • 1,417
  • 1
  • 16
  • 26

3 Answers3

2

Just capture sequences which have quotes around and non-quote symbols inside:

var processed = Regex.Matches(unprocessed, "\"[^\"]+\"")
                     .Cast<Match>()
                     .Select(m => m.Value)
                     .ToArray();

Output:

[
  "\"foo,bar\"",
  "\"foobar\"",
  "\"shizzle ma nizzle\""
]

If simple enumerable is good for you, you can use nice simple query:

var processed = from Match m in Regex.Matches(unprocessed, "\"[^\"]+\"")
                select m.Value;
Sergey Berezovskiy
  • 232,247
  • 41
  • 429
  • 459
  • You can use query syntax to produce a `List` as well by surrounding the query in parentheses followed by `.ToList()` – Abion47 Jan 10 '17 at 10:49
  • when i use this to separate """wubba,lubba"",""dub dub""" it gives me the wrong wubba lubba and dub dub. See my example code – martijn Jan 10 '17 at 10:52
  • Also if you're only going to use the matches and not the groups, there's no reason to use grouping syntax in the pattern. It returns the exact same results without the parentheses in the pattern string. – Abion47 Jan 10 '17 at 10:52
  • @Abion47 it's matter of taste, but I hate mixing query syntax with method syntax :) True about groups! – Sergey Berezovskiy Jan 10 '17 at 10:54
  • @martijn tested with `"\"\"wubba,lubba\",\"dup dub\"\""` output is `[ "\"wubba,lubba\"", "\"dup dub\""]` what's wrong with this output? – Sergey Berezovskiy Jan 10 '17 at 10:56
  • I would hope to get "\"\"wubba,lubba\"" instead of "\"wubba lubba\"" – martijn Jan 10 '17 at 10:58
  • @SergeyBerezovskiy His `Assert` shows that he wants to capture both leading and following quotation marks, not just enclosing ones. – Abion47 Jan 10 '17 at 10:58
  • @martijn got you. Change patter no `"\"+[^\"]+\"+"`. That will capture more than one quote around – Sergey Berezovskiy Jan 10 '17 at 10:58
  • @Abion47 yes, see now. That was missing in sample of `processed` which he wants to get at the begining of question – Sergey Berezovskiy Jan 10 '17 at 11:00
2

Since your requirement also mandates that you capture multiple redundant quotation marks in any given substring (why???) a tweak of Sergey Berezovskly's pattern should yield the desired results:

var processed = Regex.Matches(unprocessed, "\"+[^\"]+\"+")
                     .Cast<Match>()
                     .Select(m => m.Value)
                     .ToList();
Abion47
  • 22,211
  • 4
  • 65
  • 88
0

Parsing CSV with Regex is the second worst method that I know of. For example a"b,c" in CSV is "a""b,c""" which can't be reliably parsed with RegEx and will leave the escaped "" in the result.

I would recommend looking for a dedicated CSV parser like CsvReader, FileHelpers, LINQtoCSV, etc. If by any chance external library is not an option : Microsoft.VisualBasic.FileIO.TextFieldParser

Parsing CSV files in C#, with header

Community
  • 1
  • 1
Slai
  • 22,144
  • 5
  • 45
  • 53