8

Possible Duplicate:
Dealing with commas in a CSV file

I wrote myself a CSV parser it works fine until I hit this record: B002VECGTG,B002VECGTG,HAS_17131_spaceshooter,"4,426",0.04%,"4,832",0.03%,0%,1,0.02%,$20.47 ,1 The escaped , in "4,426" and in "4,426" brake my parser.

This is what I am using to parse the line of text:

            char[] comma = { ',' };
            string[] words = line.Split(comma);

How do I prevent my program from breaking?

Community
  • 1
  • 1
Joe Tyman
  • 1,417
  • 5
  • 28
  • 57

3 Answers3

13

You can't just split on comma. To implement a proper parser for that case, you need to loop through the string yourself, keeping track of whether you are inside quotes or not. If you are inside a quoted string, you should keep on until you find another quote.

IEnumerable<string> LineSplitter(string line)
{
    int fieldStart = 0;
    for(int i = 0; i < line.Length; i++)
    {
        if(line[i] == ',')
        {    
            yield return line.SubString(fieldStart, i - fieldStart);
            fieldStart = i + 1;
        }
        if(line[i] == '"')
            for(i++; line[i] != '"'; i++) {}
    }
}
Neil
  • 3,899
  • 1
  • 29
  • 25
Anders Abel
  • 67,989
  • 17
  • 150
  • 217
  • This method loves chopping up words. – Joe Tyman Sep 19 '11 at 20:04
  • I've submitted an edit for the out-by-one error in the SubString, and the missing curly braces on the second for loop. Additionally, this doesn't removed the quotes from quoted fields and has different behaviour from split when called on "" – Neil May 01 '14 at 17:20
  • 2
    This example drops the last item out of the list. – greggorob64 Aug 27 '14 at 20:40
  • 2
    IEnumerable LineSplitter(string line) { int fieldStart = 0; for (int i = 0; i < line.Length; i++) { if (line[i] == ',') { yield return line.Substring(fieldStart, i - fieldStart); fieldStart = i + 1; } if (line[i] == '"') for (i++; line[i] != '"'; i++) { } } yield return line.Substring(fieldStart, line.Length - fieldStart); } – MTs Apr 14 '15 at 12:21
5

I suggest using a CSV parser instead of trying to parse by yourself.

There are some nuances to parsing CSV correctly, as you have already found out.

There are many third party ones (and several of these are free), and even one built into the Visual Basic namespace - the TextFieldParser in the Microsoft.VisualBasic.FileIO namespace.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • +1 for the `TextFieldParser` - why is it hidden in the VB namespace? – Anders Abel Sep 19 '11 at 18:35
  • @Anders - Good question... It is a really useful class. Though, for large CSV files, it is apparently slow. – Oded Sep 19 '11 at 18:40
  • Love the `TextFieldParser` here too! And also relevant to this answer for using a third-party utility, check out the FileHelpers library at http://www.filehelpers.com/ (LGPL licensing) for something not only very easy to use, but also as a great example to study and learn from if desired... – Funka Jan 18 '13 at 01:43
1

It is possible to use a Regex:

List<List<String>> rows = new List<List<String>>();
MatchCollection matches = Regex.Matches(input, @"^(?:(?:\s*""(?<value>[^""]*)""\s*|(?<value>[^,]*)),)*?(?:\s*""(?>value>[^""]*)""\s*|(?<value>[^,]*))$", RegexOptions.Multiline);
foreach(Match row in matches)
{
    List<String> values = new List<String>();
    foreach(Capture value in row.Groups["value"].Captures)
    {
        values.Add(value.Value);
    }
    rows.Add(values);
}

I do not suggest that it is the best solution, but for small files (a couple of rows) it probably isn't too bad.

Casperah
  • 4,504
  • 1
  • 19
  • 13