0

I have developed a code in C# that copies data from csv file to a data table. The csv file contains 5 Million rows and I read the rows line by line to prevent memory issues. I wonder why I still get OutOfMemory Exception. I added breakPoints to make sure the right strings are copied to my variables and they are working correctly. Any ideas?

int first_row_flag = 0;   //first row is column name and we dont need to import them
string temp;                
foreach (var row in File.ReadLines(path3))
{
    if (!string.IsNullOrEmpty(row))
    {
        int i = 0;
        if (first_row_flag != 0)
        {
            dt.Rows.Add();
            foreach (string cell in row.Split(','))
            {

                if (i < 9)
                {
                    temp = cell.Replace("\n", "");
                    temp = temp.Replace("\r", "");
                    dt.Rows[dt.Rows.Count - 1][i] = temp;
                    i++;
                }
            }
        }
        else
        {
            first_row_flag++;    //get rid of first row
        }
    }
}

The number of columns in each row is 9. Thats why I use i to make sure I will not read unexpected data in 10th column.

Here is the stack trace:

enter image description here

mason
  • 31,774
  • 10
  • 77
  • 121
Navid
  • 223
  • 1
  • 7
  • 18
  • 5
    You're loading all this data into a DataTable at once? Seems like that's going to use a lot of memory.... – mason Feb 15 '17 at 16:19
  • 3
    you're still adding and adding and adding to a (i'm guessing) a datatable. – Kritner Feb 15 '17 at 16:19
  • 3
    How exactly is reading line by line help here if you still hold all the rows in memory? – decPL Feb 15 '17 at 16:20
  • 1
    why would this not eventually have you run out of memory? If you do nothing but "put in" without ever "taking out", you're eventually going to run out. – Kritner Feb 15 '17 at 16:20
  • I think OP's concern is why creating 5 millions of empty rows (as only first one is filled) causing OOM. Most likely due to usage of 32bit process to run the code (see http://stackoverflow.com/questions/14186256/net-out-of-memory-exception-used-1-3gb-but-have-16gb-installed)... (ignoring practical value of creating millions of empty rows)... – Alexei Levenkov Feb 15 '17 at 16:28
  • Check your value for `i` as you loop through. Your `i < 9` check seems very strange. Seems like you're using `i` to keep track of whether it's the first row or not, and then later expecting it to also represent the number of columns. You need separate variables for that. And variables that hold only two states should be a boolean, not an integer. – mason Feb 15 '17 at 16:37

1 Answers1

2

5 million rows, could be too much data to handle. (it will depend on the number of columns and values). Check the file size and then compare it with the memory available for a rough idea. The point is, with this much data , you will end up with out of memory exception with other techniques, most of the time.

You should reconsider the usage of DataTable, if you are holding records so that you can later do an insert in DB, then process your data in small batches.

If you decide to handle data in batches, then you could even think about not using DataTable at all, instead use List<T>.

Also, look at other techniques to read CSV file. Reading CSV files using C#

Community
  • 1
  • 1
Habib
  • 219,104
  • 29
  • 407
  • 436