4

I'm reading huge csv files (about 350K lines by file) using this way:

StreamReader readFile = new StreamReader(fi);
    string line;
    string[] row;
    readFile.ReadLine();
    while ((line = readFile.ReadLine()) != null)
    {
        row = line.Split(';');
        x=row[1];
        y=row[2];
        //More code and assignations here...
    }
    readFile.Close();
}

The point here is that reading line by line a huge file for every day of the month may be slow and I think that it must be another method to do it faster.

  • 1
    Any method will have to read the entire file. – CodeCaster Feb 16 '16 at 16:05
  • Can you add a small example of the .csv you are wanting to parse? It may help us to see what you're trying to parse. – Vahlkron Feb 16 '16 at 16:06
  • 5
    Possible duplicate of [CSV parser/reader for C#?](http://stackoverflow.com/questions/906841/csv-parser-reader-for-c) – kayess Feb 16 '16 at 16:06
  • 2
    Just to save you some time, the last time I benchmarked a `ReadLine()` loop versus a custom buffer-based method (that didn't create strings but rather small value type offset-size pairs into the buffer) versus `ReadAllLines()`, the `ReadLine()` loop came out on top. Concentrate on optimizing your processing instead. – Cameron Feb 16 '16 at 16:09
  • 2
    Be careful using `Split` and `ReadLine` to parse csv as you'll read it incorrectly if there are separator characters or newline characters within quotes in the data. Using something like Microsoft.VisualBasic.FileIO.TextFieldParser is safer. – Andy Nichols Feb 16 '16 at 16:15
  • 1
    Could you describe your task more in detail: do you need all rows or only particular ones based on some kind of id column? Do you need to display it on the UI and you can load it lazily page-by-page or you need to process the whole file? – alex.b Feb 16 '16 at 16:15
  • @alex.b The CSV that I'm using is a 350K lines file, and I need to process every line to construct a dictionary with all that data. Also, to process the data of a whole month, I need to process 31 files. – Fernando Gallego Fernández Feb 16 '16 at 17:21

2 Answers2

25

Method 1

By using LINQ:

var Lines = File.ReadLines("FilePath").Select(a => a.Split(';'));
var CSV = from line in Lines 
          select (line.Split(',')).ToArray();

Method 2

As Jay Riggs stated here

Here's an excellent class that will copy CSV data into a datatable using the structure of the data to create the DataTable:

A portable and efficient generic parser for flat files

It's easy to configure and easy to use. I urge you to take a look.

Method 3

Rolling your own CSV reader is a waste of time unless the files that you're reading are guaranteed to be very simple. Use a pre-existing, tried-and-tested implementation instead.

Community
  • 1
  • 1
Vignesh Kumar A
  • 27,863
  • 13
  • 63
  • 115
  • 6
    The very first operation is `File.ReadAllLines`, which pulls the entire file contents into memory before linq is even used. – gunr2171 Feb 16 '16 at 16:13
  • @gunr2171 I have changed it to ReadLines since `File.ReadLines()` returns an `IEnumerable` and it does not read the whole file at one go, so it is really a better option when working with large size files. – Vignesh Kumar A Feb 16 '16 at 16:40
  • 1
    Method 3 should be method 1. CSV's are complicated! – jpaugh Sep 11 '18 at 16:08
  • 1
    Method 1 doesn't work for '\n' (carriage return) and/or ';' (semicolon) inside a string. Parsing csv with split is not possible in the generic case. – mgueydan Oct 22 '18 at 15:08
9

In a simple case (there're no quotation, i.e. '"' within the file) when you expect partial reading, you may find useful

  var source = File
    .ReadLines(fileName)
    .Select(line => line.Split(';'));

for instance if you want to find out if there's a line in CSV such that 3d column value equals to 0:

  var result = source
    .Any(items => items[2] == "0");
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215