4

I have some pretty typical code to parse a CSV file using a Microsoft.VisualBasic.FileIO.TextFieldParser:

using (TextFieldParser parser = new TextFieldParser(new StringReader(jobsReport)))
{
    parser.SetDelimiters(",");
    parser.HasFieldsEnclosedInQuotes = true;

    string[] headers = parser.ReadFields();

    while (!parser.EndOfData)
    {
        // ERROR OCCURS HERE
        string[] fields = parser.ReadFields();
        // ...
    }
}

However, I get the following exception:

Microsoft.VisualBasic.FileIO.MalformedLineException: Line 195 cannot be parsed using the current Delimiters.

Line 195 (according to Notepad++) is as follows:

0,0,0,0,0,0,0,2017-01-03T00:00:00,,"72d7a7e9-8700-4014-916c-a85e9a6b1ac5",1,REF212U,CREATED,"Evan job for ""Publish Bug""",Changzhou,China,31.77359,119.95401,Jiangsu

If I manually get rid of the double quotes and make the line this:

0,0,0,0,0,0,0,2017-01-03T00:00:00,,"72d7a7e9-8700-4014-916c-a85e9a6b1ac5",1,REF212U,CREATED,"Evan job for Publish Bug",Changzhou,China,31.77359,119.95401,Jiangsu

it works perfectly. I'm confused as to why this causes an issue, though, because according to this Q&A, this is the correct way to escape double quotes in a CSV file.

Also, Excel does just fine with this file:

enter image description here

This is exactly how I'd expect this to be handled.

Microsoft has a document that describes the error as follows:

The specified line cannot be parsed because it uses delimiters other than those specified.

It then rather unhelpfully suggests that I change the following to fix it:

Adjust Delimiters so the line can be parsed correctly, or insert exception-handling code in order to handle the line.

I don't understand how this applies, though - that doesn't appear to be the problem in this case because I was able to fix the problem by removing the double quotes inside the cell, not by changing anything about the delimiters.

My questions, then: why does the parser crash on this line, even though the escaping appears to be correct? And why this particular error message? And, of course, most importantly: how do I fix this?

Edit: Here are a few more of the surrounding lines:

"RBCD","0","0","0","1","0","4","0","2017-01-02T00:00:00","","f233e70a-293d-4953-b96d-79eb29261ea7","1","REF211I","OFFER","Planning Engineer","Wuxi","China"
"AFCO","0","0","0","0","0","0","0","2017-01-03T00:00:00","","15b6eda6-ce71-426d-8530-49c9044b3d62","0","REF214Q","CREATED","Technical Consultant","Shanghai","China"
"RBUS","0","0","0","0","0","0","0","2017-01-03T00:00:00","","a8f3930f-897c-4e4d-8b52-9029efdb9c65","0","REF215E","CREATED","Sr. Calibration Engineer","Farmington Hills","United States"
"PTCN","0","0","0","0","0","0","0","2017-01-03T00:00:00","","72d7a7e9-8700-4014-916c-a85e9a6b1ac5","1","REF212U","CREATED","Evan job for ""Publish Bug""","Changzhou","China"
"RBAC","0","0","0","0","0","0","0","2017-01-03T00:00:00","","fd643834-bafd-4674-aad9-4fe5cb0271cd","0","REF213F","CREATED","Test_Technical Consultant","Suzhou","China"
"RBCN","0","0","0","0","0","1","0","2017-01-04T00:00:00","","0c450e80-f64e-429d-9d85-3fd09a806d1f","1","REF218R","SOURCING","Test Job Role approver","Shanghai","China"
"","0","0","0","1","0","3","0","2017-01-04T00:00:00","","c952f49c-db78-4a86-8ad5-797e1b7f6933","1","REF217C","OFFER","quality engineer-SQE","Shanghai","China"
  • Have your tried setting the type `parser.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited`? – Racil Hilan Jan 22 '18 at 19:46
  • @RacilHilan Yes, unfortunately it didn't appear to help. – EJoshuaS - Stand with Ukraine Jan 22 '18 at 19:47
  • 3
    The source code of `TextFieldParser` is not public, but its Mono counterpart is, and if we assume that it's a faithful clone of the Microsoft library, the implementation [does not cover](https://github.com/mono/mono-basic/blob/master/vbruntime/Microsoft.VisualBasic/Microsoft.VisualBasic.FileIO/TextFieldParser.vb#L128) the escaped-quotes-in-quote case: while in a quoted token, it just interprets any double quote as end-of-quoted. If I'm not misinterpreting the source, this is clearly a shortcoming, if not a bug. – Cee McSharpface Jan 22 '18 at 19:52
  • I can not reproduce the error with given sample line. Without being able to inspect the surrounding text it is impossible to offer much help. The source text is likely malformed as the error indicates, but you will need to investigate that. – TnTinMn Jan 22 '18 at 21:34
  • @TnTinMn I edited to include a sample of a few surrounding lines if that's helpful. I couldn't find anything with them, but maybe I'm missing something. I guess you *could* argue that the fact that I'm posting the problem here implies that I *did* miss something, though :) – EJoshuaS - Stand with Ukraine Jan 22 '18 at 21:39
  • In your expanded sample , the problem line is not the same as originally posted. In the original - `"Evan job for ""Publish Bug""". In the expanded - `"Evan job for "Publish Bug""`. The expanded sample yields the error as one would expect it to do given that the enclosed quote is not properly escaped. – TnTinMn Jan 22 '18 at 21:50
  • @TnTinMn Yes, I appear to have copied from the wrong file - the original line is the correct version. – EJoshuaS - Stand with Ukraine Jan 22 '18 at 22:09
  • I had no issues parsing a text file created from your expanded sample. At this point, I can not offer any suggestions as any further investigation on my part would require access to the entire source text. I do suggest that you _dump_ the string jobsReport to a file and write a small test program to investigate it further. Good luck. – TnTinMn Jan 22 '18 at 23:07

2 Answers2

1

Afraid TextFileParser can handle this situation. You might have better luck with an oledb connection and reading it with a datareader. Or use other open source CSV parsers.

The one I can recommend is Cinchoo ETL

using (var p = new ChoCSVReader("QuoteInQuote.csv"))
{
    foreach (dynamic rec in p)
        Console.WriteLine(rec.DumpAsJson());
}

Output:

{
  "Column1": 0,
  "Column2": 0,
  "Column3": 0,
  "Column4": 0,
  "Column5": 0,
  "Column6": 0,
  "Column7": 0,
  "Column8": "2017-01-03T00:00:00",
  "Column9": null,
  "Column10": "72d7a7e9-8700-4014-916c-a85e9a6b1ac5",
  "Column11": 1,
  "Column12": "REF212U",
  "Column13": "CREATED",
  "Column14": "Evan job for \"Publish Bug\"", <-- Correct value
  "Column15": "Changzhou",
  "Column16": "China",
  "Column17": 31.77359,
  "Column18": 119.95401,
  "Column19": "Jiangsu"
}
Cinchoo
  • 6,088
  • 2
  • 19
  • 34
1

Have you tried the following? It worked for my case.

parser.HasFieldsEnclosedInQuotes = false;
EFKan
  • 634
  • 5
  • 9