3

To read a CSV file, I use the following statement:

var query = from line in rawLines
    let data = line.Split(';')
    select new
    {
    col01 = data[0],
    col02 = data[1],
    col03 = data[2]
    };

The CSV file I want to read is malformed in the way, that an entry can have the separator ; itself as data when surrounded with qutation marks.

Example:

col01;col02;col03
data01;"data02;";data03

My read statement above does not work here, since it interprets the second row as four columns.

Question: Is there an easy way to handle this malformed CSV correctly? Perhaps with another LINQ query?

John Threepwood
  • 15,593
  • 27
  • 93
  • 149
  • Unfortunately it's not a CSV as it's not comma separated. The likelyhood is you will need to write a small parser which checks for opening and closing quotation marks and ignores any specific characters eg. the separator in between them. – WestDiscGolf Oct 28 '13 at 08:22
  • It is not malformed. There are several options, one would be to run through every character and remember the 'open' state of quotes. – Silvermind Oct 28 '13 at 08:23
  • This answer may help http://stackoverflow.com/questions/5567691/handling-commas-within-quotes-when-exporting-a-csv-file-c4-any-suggestions?rq=1 – WestDiscGolf Oct 28 '13 at 08:24
  • possible duplicate of [C#, regular expressions : how to parse comma-separated values, where some values might be quoted strings themselves containing commas](http://stackoverflow.com/questions/1189416/c-regular-expressions-how-to-parse-comma-separated-values-where-some-values) – Martin Liversage Oct 28 '13 at 08:24
  • 3
    @WestDiscGolf it definitly is a CSV, normally the seperator used by the operating system is defined in the regional settings, which in my country and I guess for the Invariant culture too is a semicolon. – Silvermind Oct 28 '13 at 08:25
  • @Silvermind nice ... did not know that ... learnt something new today already. Thanks! :-) – WestDiscGolf Oct 28 '13 at 08:36

3 Answers3

12

Just use a CSV parser and STOP ROLLING YOUR OWN:

using (var parser = new TextFieldParser("test.csv"))
{
    parser.CommentTokens = new string[] { "#" };
    parser.SetDelimiters(new string[] { ";" });
    parser.HasFieldsEnclosedInQuotes = true;

    // Skip over header line.
    parser.ReadLine();

    while (!parser.EndOfData)
    {
        string[] fields = parser.ReadFields();
        Console.WriteLine("{0} {1} {2}", fields[0], fields[1], fields[2]);
    }
}

TextFieldParser is built in .NET. Just add reference to the Microsoft.VisualBasic assembly and you are good to go. A real CSV parser will happily handle this situation.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
1

Parsing CSV files manually can always lead to issues like this. I would advise that you use a third party tool like CsvHelper to handle the parsing.

Furthermore, it's not a good idea to explicitly parse commas, as your separator can be overridden in your computers environment options.

Let me know if I can help further,

Matt

Matt Griffiths
  • 1,142
  • 8
  • 26
0

Not very elegant but after using your method you can check if any colxx contains an unfinished quotation mark (single) you can join it with the next colxx.

Ignacio Soler Garcia
  • 21,122
  • 31
  • 128
  • 207