4

I load a large amount of records into my application (1 million+) and do a ton of processing on them. The processing requires them all to be in the memory.

Afterwards, I want to dump all of the (now modified) records into an empty table.

Loading the records takes mere seconds, and I end up with a large array of MyRecord items.

Saving using SqlBulkCopy takes mere seconds as well.

However SqlBulkCopy requires (I believe) a DataTable - and loading my records into a DataTable is slow - approximately 7500 records per minute using

dataTable.Rows.Add(myRecord.Name, myRecord.Age, ....)

Is there a faster way of performing this middle step?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
NibblyPig
  • 51,118
  • 72
  • 200
  • 356
  • Have you seen https://stackoverflow.com/questions/13722014/insert-2-million-rows-into-sql-server-quickly (guy there claims 1 million rows/sec)? Does your DataTable have any indexes, keys, expression columns, constraints, relations to other tables etc, or it is as simple as simple can be? – Caius Jard Nov 09 '17 at 16:53
  • My datatable is completely simple, some string, int and datetime fields with nothing extra – NibblyPig Nov 09 '17 at 16:58
  • His example is creating new rows and adding the data, which is what I'm doing too, it's very slow. I'm not sure why he is using reflection when you don't need to do that. – NibblyPig Nov 09 '17 at 17:00
  • 1
    https://stackoverflow.com/questions/21779412/performance-issue-with-sqlbulkcopy-and-datatable?rq=1 ? – Caius Jard Nov 09 '17 at 17:02
  • yeah, look up how to bulk insert into database using XML – MethodMan Nov 09 '17 at 17:03
  • @MethodMan which is exactly the same. You are talking about bcp, not SqlBulkCopy, which will parse the data first, then use the *same* mechanism as SqlBulkCopy to bulk import the data. – Panagiotis Kanavos Nov 09 '17 at 17:19
  • 1
    Deleting because after investigating with these answers it turns out the problem was a simple lookup that got snagged int he benchmarking code. – NibblyPig Nov 09 '17 at 17:24
  • @NibblyPig before you delete, consider using ObjectReader so you *don't* need to buffer everything into a DataTable – Panagiotis Kanavos Nov 09 '17 at 17:34
  • @PanagiotisKanavos it's not the same thing.. my dba disables bulk copy to our database and I have created a way around it inserting bulk data into a database via XML and tmp tables.. sorry if you are not aware of how to do this simple awesome process.. – MethodMan Nov 09 '17 at 20:56
  • @MethodMan that's not "inserting bulk data". It's loading data in a very slow way. A flat file is a *lot* faster because it doesn't require any parsing to detect fields and records. And the dba can't disable bcp, BULK INSERT, the SSIS data destination or the Import Wizard which also works this way, ie it used the bulk loading mechanism to quickly insert data into a table. You didn't need to create anything, just use the Import Wizard and save the SSIS package it generates. – Panagiotis Kanavos Nov 10 '17 at 08:44
  • 1
    @MethodMan as for the "simple awesome process" I'm loading data from 8 different systems right now - databases, FTP, S3 files, IATA HOT files (special parsing), screen scraping, web services (ie XML), REST services, matching them in the SSIS dataflow to find discrepancies. I even use TPL Dataflow to parse text files, transform them and insert them into the database in the same time it would take to write them to a text file for import. TPL Dataflow allows me to process multiple requests in parallel too, eg 10 REST queries at once – Panagiotis Kanavos Nov 10 '17 at 08:48
  • @MethodMan `bulk load` has a very specific meaning. It means you are sending a *stream* of data instead of individual commands, eliminating the overhead of executing individual commands and the memory needed for transferring batches of data. It also means that the database is using [minimal logging](https://technet.microsoft.com/en-us/library/ms191244(v=sql.105).aspx), ie instead of logging individual INSERTs, only data pages are logged. The combination can speed up inserting almost 10 times if not more – Panagiotis Kanavos Nov 10 '17 at 08:57

2 Answers2

12

The delay is caused because you have to buffer everything into a DataTable before sending it to the server. To get better performance you should send the records to SqlBulkCopy immediatelly, and let the class use its own buffering and batching.

SqlBulkCopy can work with an IDataReader. All ADO.NET data readers implement this interface, so you can push data that you read from any data reader to SqlBulkCopy.

In other cases, assuming you have an IEnumerable of your objects, you can use Marc Gravel's ObjectReader from the FastMember package to create an IDataReader on top of the IEnumerable. This data reader does not load everything at once, so no data is cached until SqlBulkCopy asks for it :

Copying Marc Gravel's example:

IEnumerable<SomeType> data = ... 

using(var bcp = new SqlBulkCopy(connection)) 
using(var reader = ObjectReader.Create(data, "Id", "Name", "Description")) 
{ 
  bcp.DestinationTableName = "SomeTable"; 
  bcp.WriteToServer(reader); 
}
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
0

I don't know what the issue is. Program below runs under a second. I suspect the slow speed is due to reading data and not writing to DataTable.

       static void Main(string[] args)
        {
            DataTable dt = new DataTable();
            dt.Columns.Add("Col A", typeof(int));
            dt.Columns.Add("Col B", typeof(string));
            dt.Columns.Add("Col C", typeof(int));
            dt.Columns.Add("Col D", typeof(string));
            dt.Columns.Add("Col E", typeof(int));
            dt.Columns.Add("Col F", typeof(string));
            dt.Columns.Add("Col G", typeof(int));
            dt.Columns.Add("Col H", typeof(string));
            dt.Columns.Add("Col I", typeof(int));
            dt.Columns.Add("Col J", typeof(string));

            DateTime begin = DateTime.Now;

            for (int i = 0; i < 7500; i++)
            {
                dt.Rows.Add(new object[] {
                    i + 10000, "b", i + 20000, "d", i + 30000, "f", i + 40000, "h", i + 50000, "i"
                });
            }

            DateTime end = DateTime.Now;

            Console.WriteLine((end - begin).ToString());

            Console.ReadLine();
        }
jdweng
  • 33,250
  • 2
  • 15
  • 20