-2

I need to split a comma separated string, which contents are sometimes between quotes. An example could be:

1,"TEST",22345,"18,95", Ab"cde

The first problem here is to split the string only when the comma is not surrounded by quotes like "18,95". I've done that with a simple Regex. The next problem is to cut out the quotes, where they surround the content. For "TEST" and "18,95" the quotes should be removed.The quote in Ab"cde should be left untouched. Here is my code so far:

List<string> results = Regex.Matches(this.Content, @"[\""](.+?)[\""]|[^,]+")
                            .Cast<Match>()
                            .Select(m => m.Value)
                            .Select(s => s.StartsWith("\"") && s.EndsWith("\"") ? s.Remove(1,1).Remove(s.Length-1,1) : s)
                            .ToList();

For the second Select I get an ArgumentOutOfRangeException since the second Remove doesn't work with s anymore. I thought it should work, but somehow doesn't.

If there is a better way to do this I would be happy to learn about it.

Romano Zumbé
  • 7,893
  • 4
  • 33
  • 55

4 Answers4

2

You may capture the value between quotes and the unquoted value in named capture groups with the same name and retrieve the matched capture with Match.Groups["group_name"]:

List<string> results = Regex.Matches(this.Content, @"[\""](?<value>.+?)[\""]|(?<value>[^,]+)")
    .Cast<Match>()
    .Select(m => m.Groups["value"].Value)
    .ToList();

Demo: https://dotnetfiddle.net/M8lJDR

To take into account potential empty values change + to * and surround the regex with (?<=^|,) and (?=,|$):

List<string> results = Regex.Matches(input,
        @"(?<=^|,)(?:""(?<value>.*?)""|(?<value>[^,]*))(?=,|$)")
    .Cast<Match>()
    .Select(m => m.Groups["value"].Value)
    .ToList();

Demo: https://dotnetfiddle.net/WqRD20

Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
  • Wow. that is a good one! Perfect. Do you perhaps also have an idea how I could adjust the regex to also capture empty 'values'. Like in 1,2,,4 there would be an empty third place, which should result in an empty string – Romano Zumbé May 31 '17 at 11:24
  • Just change quantifiers from `+` to `*`. See the updated demo – Dmitry Egorov May 31 '17 at 11:26
  • I thought of that too, but unfortunately it adds an empty result after every 'normal' result which I would need to cleanup – Romano Zumbé May 31 '17 at 11:30
  • @RomanoZumbé: yeah, an extra delimiter check is required. Please see the updated answer. – Dmitry Egorov May 31 '17 at 11:58
0

After calling s.Remove(1,1), the resulting string will be shorter than s (which remains unchanged)!

Use

s.Remove(1,1).Remove(s.Length-2,1)

or

s.Remove(s.Length-1,1).Remove(1,1)

instead of

s.Remove(1,1).Remove(s.Length-1,1)

This will still fail, when s="\"". To cover this, you have to update the condition:

s.StartsWith("\"") && s.EndsWith("\"") && s.Length > 1 ? s.Remove(s.Length-1,1).Remove(1,1) : s
wkl
  • 1,896
  • 2
  • 15
  • 26
0

How about using .Substring():

List<string> results = Regex.Matches(content, @"[\""](.+?)[\""]|[^,]+")
                        .Cast<Match>()
                        .Select(m => m.Value)
                        .Select(s => s.StartsWith("\"") && s.EndsWith("\"") 
                            ? s.Substring(1, s.Length - 2) : s)
                        .ToList();

Output:

1
TEST
22345
18,95
Ab"cde

Note: Doesn't work properly for parts that contain more than 2 quotes i.e ""test""work"",1

degant
  • 4,861
  • 1
  • 17
  • 29
0

Maybe you could loop through each of results and:

for (int i=0; i < results.Count; i++)
{
   if (results[i].StartsWith("\"")) 
       results[i] = results[i].Remove(0, 1);

   if (results[i].EndsWith("\""))  
      results[i] = results[i].Remove(results[i].Length - 1, 1);
}
pitersmx
  • 935
  • 8
  • 27