18

Possible Duplicate:
CSV File Imports in .Net

In .net, is there a standard library that should be used to read in csv files? All the samples on the web roll their own csv reader / parser, or use OleDb.

It's not a problem using any of these solutions, I was just wondering if there is a generally accepted library (not that I can find), or any other "proper" way to do it?

Community
  • 1
  • 1
Gareth
  • 2,424
  • 5
  • 26
  • 44
  • 1
    This is linked as a duplicate of another question that is currently closed. I vote for reopening. – cdiggins Oct 07 '18 at 02:50

7 Answers7

19

CsvReader is a pretty good one... it isn't Microsoft, but it works very well, and is a lot faster than some of the alternatives (legacy OleDb etc).

Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • This was the best library I could find. I had to fix a few bugs in it related to more obscure support for certain things but it was fine on the standard stuff. IT is somewhat inefficient in that it does repeated string concatenation. With a bit of rejigging I changed it to use StringBuilder in the main loop for most cases and that gives you a massive speed boost – ShuggyCoUk Jul 09 '09 at 12:48
  • Yes, I'm investigating it at the moment - it seems to handle a lot of the edge cases. There doesn't seem to be anything else like it. – Gareth Jul 09 '09 at 12:50
  • 8
    I'm actually going to see if the guys wants my code changes rather than trying to maintain it myself... – ShuggyCoUk Jul 09 '09 at 12:58
11

One of the reasons that many people write their own is that CSV isn't quite so simple. For example:

  1. Does the first row contain field names, or not?
  2. Do you support dates? If, so, are they quoted, surrounded by # marks, in a certain day-month-year order?
  3. Does it support linefeeds that occur inside quoted text values? Or does that split the record?
  4. How do you escape a quote inside of a quoted string? Do you double the quote, or use a backslash or other escape character?
  5. What character encoding(s) are supported?
  6. How does it handle escaped control characters? &#XX; or \uXXXX or some other method?

These are some of the reasons people write their own parsers, because they're stuck reading files created with all these different settings. Or they write their own serializers, because the target system has a bunch of these idiosyncrasies.

If you don't care about these issues, just use the most convenient library. But understand they are there.

lavinio
  • 23,931
  • 5
  • 55
  • 71
  • 13
    These all sound like good reasons *not* to write your own, unless you want to repeat the same mistakes that others have already made (and possibly fixed). – LukeH Jul 09 '09 at 12:57
  • 1
    The CSV reader should deliver a list of rows containing strings or a corresponding iterable; how dates/numbers/whatever are stored inside a CSV field is the concern of a different layer in a non-monolithic app. Other points are all good reasons for having a CSV reading & writing package, NOT for doing it yourself. Many DIY efforts reinvent the wheel as a polygon with less than 6 sides and no axle :-) – John Machin Jul 10 '09 at 03:31
  • I agree. This is why I was encouraging people *not* to write their own, or if they *have* to, to think about the issues. – lavinio Jul 10 '09 at 03:35
  • 1
    @lavinio: You gave what you regarded as reasons why people write their own. There is nothing in your answer (even after your edit) that could in any way be construed as encouragement to do otherwise. – John Machin Jul 10 '09 at 04:57
  • I don't think you can properly make the decision until you understand what processes are going to be sharing the data. Ideally, you'd use a prepackaged one, but I've spent enough time trying to solve incompatibilities with this "simple" format to know otherwise. (I work for a data integration company.) – lavinio Jul 10 '09 at 14:06
11

The VB namespace has a great TextFieldParser class. I know, c# people don't like to reference a library from that 'basic' language, but it is quite good.

It is located at Microsoft.VisualBasic.FileIO.TextFieldParser

I used to mess with OLEDB, creating column definition files etc - but find the TextFieldParser a very simple and handy tool for parsing any delimited files.

Roger Lipscombe
  • 89,048
  • 55
  • 235
  • 380
aSkywalker
  • 1,381
  • 1
  • 13
  • 23
  • 3
    Yes, I like this class too... but I really wonder why MS put it in a VB-specific assembly, it doesn't make any sense ! – Thomas Levesque Jul 09 '09 at 13:24
  • 8
    @Thomas: VB programmers expect easy-to-use string parsing functions, whereas C-style programmers expect to suffer horribly when it comes to strings. – MusiGenesis Jul 09 '09 at 13:31
  • 2
    I have just recently discovered this class, and it was just what I was seeking. It is built-in, simple to use, and handles delimited fields with quotes. I recommend it for times when you don't need a complex solution, especially when working in a environment that is not very open to third-party libraries. – Mark Meuer Feb 18 '11 at 16:31
8

Try CsvHelper (a library I maintain). It's also available via NuGet.

CsvHelper allows you to read your CSV file directly into your custom class.

var streamReader = // Create a reader to your CSV file.
var csvReader = new CsvReader( streamReader );
List<MyCustomType> myData = csvReader.GetRecords<MyCustomType>();

CsvReader will automatically figure out how to match the property names based on the header row (this is configurable). It uses compiled expression trees instead of reflection, so it's very fast.

It is also very extensible and configurable.

Josh Close
  • 22,935
  • 13
  • 92
  • 140
  • 1
    Do have any example on this as the documentation on the project site is barebones! I want to read the whole file in and then allow the user to map headings to a list of attributes I define – Andrew Mar 29 '11 at 22:25
  • It will hopefully be moving to the wiki in the github repository soon. The way that site was created was a major pain; using sandcastle. If you have a specific question, ask a new SO question. – Josh Close Mar 30 '11 at 18:04
  • See this question: http://stackoverflow.com/questions/5496845/using-csvhelper-with-c-mvc-to-import-csv-files – Andrew Mar 31 '11 at 08:26
  • 1
    There is a documentation site now. http://joshclose.github.io/CsvHelper – Josh Close Jan 20 '14 at 16:26
3

KBCsv is another option, particularly if you require efficiency and the ability to work with massive CSV files.

Disclosure: I wrote KBCsv, hence the "KB" ;)

Kent Boogaart
  • 175,602
  • 35
  • 392
  • 393
2

After some more investigation, there is also this: http://www.filehelpers.com/

It seems a full framework around reading files, and not just csv files.

(note: just read stuff on the website, have not used it yet)

Gareth
  • 2,424
  • 5
  • 26
  • 44
0

I'm pretty sure you can read a CSV file into a DataTable with one line of code. Once it's in a DataTable, you can sort, filter, iterate etc.

This question has some examples for reading CSVs into DataTables.

Community
  • 1
  • 1
MusiGenesis
  • 74,184
  • 40
  • 190
  • 334
  • I have been down this path many many times. When it works, it works fine - but debugging problems when it doesn't work is a massive pain. – aSkywalker Jul 09 '09 at 13:12
  • Huh? The only problems I've ever run into with CSV files were with the data itself (extra commas, missing quotation marks etc.), and those would be problems no matter what parsing method you use. – MusiGenesis Jul 09 '09 at 13:34
  • 1
    yes, the problem is almost always with the data - but where in the data? We started switching over to the TextFieldParser class in every spot (hundreds) because of the limitations and lack of control it gave. Worked 90% of the time, but when it didn't there was no help provided in the error. We created a class library to work around this - scrubbing the raw data first etc... If you haven't tried the TextFieldParser class you really should - we love it - and parse millions of rows of csv data each month – aSkywalker Jul 10 '09 at 12:05