0

How can I use C# to parse a csv file like this?

"TeamName","PlayerName","Position"  "Chargers","Philip Rivers","QB"  "Colts","Peyton Manning","QB"  "Patriots","Tom Brady","QB"

Notice that there are zero natural line breaks. Double-spaces that reside outside of the values are what differentiate one row from another.


Relevant:

Jim G.
  • 15,141
  • 22
  • 103
  • 166
  • 3
    Is it possible that the file has only `\n` (LF), no `\r` (CR), and you've opened it in an editor that doesn't recognize Unix newlines? Or perhaps it's using another character as the record delimiter? Just want to rule out that possibility before going to what might be considered extreme measures. – madreflection Jan 16 '20 at 00:08
  • 2
    I think you are going to need to create a Regex to parse the lines (something like `("[^"]*",)*("[^"]*" )` (which doesn't include the last line). Once you parse the lines into lines, the rest is easy – Flydog57 Jan 16 '20 at 00:32
  • How about: Find: `(".*?"(?:,".*?")*) ` Replace: `$1\n` – Toto Jan 16 '20 at 14:19

2 Answers2

0

Using @toto's ideas (and mine) in the comments, how about something like this.

Use a regex to parse each line, and then take the contents of each line and make it into a line by added a "\r\n" at the end of each line.

 const string input =
     "\"TeamName\",\"PlayerName\",\"Position\"  \"Chargers\",\"Philip Rivers\",\"QB\"  \"Colts\",\"Peyton Manning\",\"QB\"  \"Patriots\",\"Tom Brady\",\"QB\"";
 const string linePattern = "(?<Line>(\"[^\"]+\",?)+)  ";
 var lineRegex = new Regex(linePattern);

 var linesText = lineRegex.Replace(input, "${Line}\r\n");

At the end of this, linesText looks like a regular quote delimited CSV file and you can parse it using regular tools. If I run this code, this is what linesText looks like:

"TeamName","PlayerName","Position"
"Chargers","Philip Rivers","QB"
"Colts","Peyton Manning","QB"
"Patriots","Tom Brady","QB"
Flydog57
  • 6,851
  • 2
  • 17
  • 18
0

You can try the following.

        var content = File.ReadAllText(@"path/to/csv").Replace("  ", ";");
        var result = content.Split(';');
        foreach (var str in result)
        {
            Console.WriteLine(str);

        }
salli
  • 722
  • 5
  • 10
  • That has the same problem as @kit. If there is a `` embedded in one of the quoted fields, it will be recognized as a "line-end" – Flydog57 Jan 16 '20 at 22:28
  • @Flydog57 you are correct. I didn't think that would be the case in the dataset with the sample provided. I think in that case reg ex is the best way to go. – salli Jan 17 '20 at 02:55