1

I have to load the following CSV file

head1, head2, head3, head4; head5
34 23; 2; "abc";"abc \"sdjh";8
34 23; 2; "abc";"abc 
sdj\;h
jshd";8
34 23; 2; "abc";"abc";8

The function must handle escape characters such as \" \; \n and \r and new line in the strings. Are there any good library to solve this?

bluish
  • 26,356
  • 27
  • 122
  • 180
magol
  • 6,135
  • 17
  • 65
  • 120
  • Possible duplicate of http://stackoverflow.com/questions/906841/csv-parser-reader-for-c – Paolo Tedesco Mar 25 '11 at 11:28
  • Looks like you can parse it with regex in a not too dificult way, how do you need de result, i mean, for example in the first row "34 23" is a text estring is twho numbers "34" and "23" is a full number "3423"? – SubniC Mar 25 '11 at 11:31
  • The first column is not important, so it can be ignored. I have tried with regex, but can not get to work when when there are newline characters in the text (it is mixed between the \n and \r\n, for some reason) – magol Mar 25 '11 at 11:52
  • possible duplicate of [Reading csv file](http://stackoverflow.com/questions/3507498/reading-csv-file) – Gabe Mar 25 '11 at 12:12
  • You can normalice the line breaks, just like String.replace('\n','\r\n'); this way you get all the line braks normalized and after you can use the regex. – SubniC Mar 25 '11 at 12:26

3 Answers3

3

I've had good results using CSV Reader for .Net: http://www.codeproject.com/KB/database/CsvReader.aspx.

Fabian Nicollier
  • 2,811
  • 1
  • 23
  • 19
  • +1, I've had good results too. Also, you can use Excel as an ODBC driver for reading csv files... – Daren Thomas Mar 25 '11 at 12:42
  • I've tried the ODBC drivers as well. It works but it's actually more difficult and less flexible. Perhaps for databinding or querying it's better to use ODBC. – Fabian Nicollier Mar 25 '11 at 15:08
1

That's not a valid CSV file...

The header row would be interpreted as

"head1"," head2"," head3"," head4; head5"

Every other row only has a single column in it.

I don't think any library will be able to handle this out of the box. It looks like the header row has more than one delimiter, and all the other rows might have multiple delimiters too. If you also provided what the actual columns were, it would be easier to help with.

You could give CsvHelper (a library I maintain) a try. It is pretty flexible. You could change the configuration for the headers and rows and make them different. You can set what you want the delimiter and quoted field to be. It also handles line endings of \r, \n, and \r\n even if every line is using a different line ending.

Josh Close
  • 22,935
  • 13
  • 92
  • 140
0

I couldn't get anything to pass all my tests for CSV Parsing, so I ended up writing something simple to do it. AnotherCsvParser

It does everything I need... but should be easy to fork and extend to your needs too.

Given:

 public class ABCD
 {
     public string A;
     public string B;
     public string C;
     public string D;
 }

It assumes the columns are in the order the fields are defined..(but would be easy to extend to read an attribute or something)

This works:

    var output = NigelThorne.CSVParser.ReadCSVAs<ABCD>(
"a,\"b\",c,d\n1,2,3,4\n\"something, with a comma\",\"something \\\"in\\\" quotes\",\" a \\\\ slash \",\n,,\"\n\",");

Such that:

  Assert.AreEqual(4, output.ToArray().Length);
  var row1 = output.ToArray()[0];
  Assert.AreEqual("a", row1.A);
  Assert.AreEqual("b", row1.B);
  Assert.AreEqual("c", row1.C);
  Assert.AreEqual("d", row1.D);

Note: It's probably not very fast with lots of data either.. again not a problem for me.

Nigel Thorne
  • 21,158
  • 3
  • 35
  • 51