-1

I have a big CSV file with 10 million entries, and i need to export it to SQL using C#. I'm a newby and i really don't know how to write this. I have something like this so far:

private static void ExportToDB()
        {
             SqlConnection con = new SqlConnection(@"Data Source=SHAWHP\SQLEXPRESS;Initial Catalog=FOO;Persist Security Info=True;User ID=sa");
             string filepath = @"E:\Temp.csv";
            StreamReader sr = new StreamReader(filepath);
            string line = sr.ReadLine();
            string[] value = line.Split(',');
            DataTable dt = new DataTable();
            DataRow row;
            foreach (string dc in value)
            {
                dt.Columns.Add(new DataColumn(dc));
            }

            while ( !sr.EndOfStream )
            {
                value = sr.ReadLine().Split(',');
                if(value.Length == dt.Columns.Count)
                {
                    row = dt.NewRow();
                    row.ItemArray = value;
                    dt.Rows.Add(row);
                }
            }
            SqlBulkCopy bc = new SqlBulkCopy(con.ConnectionString, SqlBulkCopyOptions.TableLock);
            bc.DestinationTableName = "tblparam_test";
            bc.BatchSize = dt.Rows.Count;
            con.Open();
            bc.WriteToServer(dt);
            bc.Close();
            con.Close();
        }

. And it gives me an error, saying this: An unhandled exception of type 'System.OutOfMemoryException' occurred in mscorlib.dll

How can i fix it? Or is there another way?

Rudi
  • 19,366
  • 3
  • 55
  • 77
  • It might help to know *where* you get this exception, so you can determine if it's the file-reading, or the database-writing that's tripping you up. – J. Steen Jul 23 '13 at 09:35
  • It is where it's tring to read the CVS:while ( !sr.EndOfStream ) { value = sr.ReadLine().Split(','); – user2553371 Jul 23 '13 at 09:37
  • What about just using [BULK INSERT](http://msdn.microsoft.com/fr-fr/library/ms188365.aspx) ? – LMeyer Jul 23 '13 at 09:46
  • have you tried different approch? withouth loading all file in memory but parsing it row by row while inserting it? – giammin Jul 23 '13 at 10:26

3 Answers3

0

If you can get the file to the server. I would use bulk insert server-side.

BULK Insert CSV

Regards.

0

Taken from MSDN:

In relation to .ReadLine()

If the current method throws an OutOfMemoryException, the reader's position in the underlying Stream object is advanced by the number of characters the method was able to read, but the characters already read into the internal ReadLine buffer are discarded. If you manipulate the position of the underlying stream after reading data into the buffer, the position of the underlying stream might not match the position of the internal buffer. To reset the internal buffer, call the DiscardBufferedData method; however, this method slows performance and should be called only when absolutely necessary.

Aaron
  • 86
  • 1
  • 5
0

You can't use such approach because string.Split creates lots of arrays that multiply amount of memory. Suppose you have 10 columns. After split you will have Array length of 10 and 10 string = 11 objects. Each of them has 8 or 16 bytes extra memory(object sync root and etc). So, memory overhead is 88 bytes for each string. 10 KK lines will consume at least 880KK memory- and add to this number size of your file and you will have the value of 1gb. This is not all, DateRow is quite heavy structure, so, you should add 10KK of data rows. And this is not all - DataTable of size 10KK elements will have size more than 40mb. So, expected required size is more than 1Gb.

For х32 process .Net can't easily use more then 1Gb memory. Theoretically it has 2 gigs, but this is just theoretically, because everything consumes memory - assemblies, native dlls and another objects, UI and etc.

The solutions is to use х64 process or read-write in chunks like below

    private static void ExportToDB()
    {
         string filepath = @"E:\Temp.csv";
        StreamReader sr = new StreamReader(filepath);
        string line = sr.ReadLine();
        string[] value = line.Split(',');
        DataTable dt = new DataTable();
        DataRow row;
        foreach (string dc in value)
        {
            dt.Columns.Add(new DataColumn(dc));
        }

        int i = 1000; // chunk size
        while ( !sr.EndOfStream )
        {
            i--
            value = sr.ReadLine().Split(',');
            if(value.Length == dt.Columns.Count)
            {
                row = dt.NewRow();
                row.ItemArray = value;
                dt.Rows.Add(row);
            }
            if(i > 0)
               continue;
            WriteChunk(dt);                 
            i = 1000;
        }
        WriteChunk(dt);

    }
void WriteChunk(DataTable dt)
{
         SqlConnection con = new SqlConnection(@"Data Source=SHAWHP\SQLEXPRESS;Initial Catalog=FOO;Persist Security Info=True;User ID=sa");
    using(SqlBulkCopy bc = new SqlBulkCopy(con.ConnectionString, SqlBulkCopyOptions.TableLock))
    {
        bc.DestinationTableName = "tblparam_test";
        bc.BatchSize = dt.Rows.Count;
        using(con.Open())
        {
            bc.WriteToServer(dt);
        }
    }
    dt.Rows.Clear()
}
  • 1
    I used your code, and i got an eror in WriteChunk saying something like this:A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: SQL Network Interfaces, error: 26 - Error Locating Server/Instance Specified), when i thy to open the connection – user2553371 Jul 23 '13 at 10:09
  • I changed some code, but this error is caused by your environment. –  Jul 23 '13 at 10:18
  • yes, it does not recognize the Sql server in the second function..in WriteChunk...i am so close to do this thanks to you! – user2553371 Jul 23 '13 at 10:24