0

Preface: This question is a derivative of this question.


Here is my code:

using System;
using System.Linq;
using System.Text.RegularExpressions;

class MainClass {
  public static void Main (string[] args) {
        const string rawLine = "\"TeamName\",\"PlayerName\",\"Position\"  \"Chargers\",\"Philip Rivers\",\"QB\"  \"Colts\",\"Peyton Manning\",\"QB\"  \"Patriots\",\"Tom Brady\",\"QB\"";
        var parsedLines = Regex.Split(rawLine, "(\".*? \"(?:,\".*? \")*)");
        parsedLines.ToList().ForEach(Console.WriteLine);

        Console.WriteLine("Press [ENTER] to exit.");
        Console.ReadLine();
  }
}

Here is my output:

"TeamName","PlayerName","Position"  "
Chargers
","Philip Rivers","QB"  "
Colts
","Peyton Manning","QB"  "
Patriots","Tom Brady","QB"
Press [ENTER] to exit.

And here is my desired output:

"TeamName","PlayerName","Position"
"Chargers","Philip Rivers","QB"
"Colts","Peyton Manning","QB"
"Patriots","Tom Brady","QB"
Press [ENTER] to exit.

How can I fix the regex to generate my desired output?


Relevant:

Jim G.
  • 15,141
  • 22
  • 103
  • 166
  • 2
    I strongly recommend using CSVHelper instead of regex for this unless you're sure the CSV won't contain things like embedded/escaped quotes or commas. CSV is a *very* flexible format. It doesn't even have to be comma-separated to qualify as a CSV. –  Jan 16 '20 at 18:26
  • @Amy I'm not familiar with CSVHelper, but I'm all ears. Can you please describe a solution with CSVHelper? – Jim G. Jan 16 '20 at 18:28
  • 1
    It has been a long while since I've had need of its services (no CSVs in the past couple of years :o ), but I remember their documentation was fairly good. See https://joshclose.github.io/CsvHelper/getting-started. –  Jan 16 '20 at 18:31
  • You could just split on `"\" \""` and put the leading and trailing double quotes back on each result. – juharr Jan 16 '20 at 18:45
  • 1
    Or maybe even do a replace of `"\" \""` with `"\"\n\""` or whatever newline characters you desire. – juharr Jan 16 '20 at 18:55
  • 1
    Unless CSVHelper has a way to specify the *record* delimiter, it's not going to be of any help. An answer (since deleted) on the previous question mentioned the `Delimiter` property of CSVHelper's configuration but that's the *field* delimiter (and that's why it was deleted). I was unable to find a record delimiter option (but that doesn't mean there isn't one). The other answer there mentions splitting, both on two spaces and on quote-space-space-quote, so one would hope that was already explored before turning to regex and then posting this question. – madreflection Jan 16 '20 at 19:32

3 Answers3

1

Use negative lookbehind, positive lookbehind, character class with quanitifer, positive lookahead, and negative lookahead.

Working Demo

using System;
using System.Linq;
using System.Text.RegularExpressions;

class MainClass {
  public static void Main (string[] args) {
        const string rawLine = "\"TeamName\",\"PlayerName\",\"Position\"  \"Chargers\",\"Philip Rivers\",\"QB\"  \"Colts\",\"Peyton Manning\",\"QB\"  \"Patriots\",\"Tom Brady\",\"QB\"";
            var parsedLines = Regex.Split(rawLine, "(?<![,])(?<=[\"])[ ]{2}(?=[\"])(?![,])");
            parsedLines.ToList().ForEach(Console.WriteLine);

            Console.WriteLine("Press [ENTER] to exit.");
            Console.ReadLine();
  }
}
Jim G.
  • 15,141
  • 22
  • 103
  • 166
0

As Amy has already mentioned, your string seems to be something like CSV. If it is really a valid CSV - use special libraries.

If CSVHelper isn't applicable in this case and you really need regex, try something like this one:

(?<=(?:^|  ))(.*?)(?=(?:  \")|$)

I haven't coded for C#, so regex may need some edits due to c# specific.

Edit. Code example.

using System;
using System.Linq;
using System.Text.RegularExpressions;

class MainClass {
  public static void Main (string[] args) {
        const string rawLine = "\"TeamName\",\"PlayerName\",\"Position\"  \"Chargers\",\"Philip Rivers\",\"QB\"  \"Colts\",\"Peyton Manning\",\"QB\"  \"Patriots\",\"Tom Brady\",\"QB\"";
            //var parsedLines = Regex.Split(rawLine, "(?<=(?:^|  ))(.*?)(?=(?:  \")|$)");
      var parsedLines = Regex.Split(rawLine, "(?<=^)(.*?)(?=(?:  \")|$)|(?<=  )(.*?)(?=(?:  \")|$)");
            parsedLines.ToList().ForEach(Console.WriteLine);

            Console.WriteLine("Press [ENTER] to exit.");
            Console.ReadLine();
  }
}

This code with "dirty" fix for assertion error. However, i can't reproduce it with onlinetool :) Original regex commented in this example.

I hope, this will help you. But i must say again if you working with csv - it is better to use special tools, not regex :)

MadRay
  • 441
  • 5
  • 10
  • You can use dotnetfiddle or regexstorm for testing regex in c#. –  Jan 16 '20 at 18:39
  • Ouch, thanks. I missed that i can test on repl.it link, that you provided. Regex her seemd to be good. But if C# gets asserion error - we can try "dirty" fix. I'll edit my post right now. – MadRay Jan 16 '20 at 18:45
0

Good comments through-out the thread (I would strongly suggest pursuing one of those options), I wont focus on them. Here's an alternative solution that uses Matches from the Regex pattern, skip how many fields you have (columns) and then take how many records you want.

I'm using a pattern like (\"(.*?)[^,]") and explanation can be found here of what it means.

const string rawLine = "\"TeamName\",\"PlayerName\",\"Position\"  \"Chargers\",\"Philip Rivers\",\"QB\"  \"Colts\",\"Peyton Manning\",\"QB\"  \"Patriots\",\"Tom Brady\",\"QB\"";                       
var matches = new Regex(@"(\""(.*?)[^,]"")").Matches(rawLine).Cast<Match>().ToList();
// loop through our matches
for(int i = 0; i < matches.Count; i++)
{                
    // join our records we need to output
    string str = string.Join(",", matches.Skip(i * 3).Take(3));
    if(!string.IsNullOrEmpty(str))
         Console.WriteLine(str);
}            
Console.WriteLine("Press [ENTER] to exit.");
Console.ReadLine();

Please note, there's no error checking at all, can be improved, but does produces the output you need. *Also make sure you import System.Linq if not already there.

Output Test

enter image description here

Trevor
  • 7,777
  • 6
  • 31
  • 50