Fastest way to insert 1 million rows in SQL Server

Question

I am writing a stored procedure to insert rows into a table. The problem is that in some operation we might want to insert more than 1 million rows and we want to make it fast. Another thing is that in one of the column, it is Nvarchar(MAX). We might want to put avg 1000 characters in this column.

Firstly, I wrote a prc to insert row by row. Then I generate some random data for insert with the NVARCHAR(MAX) column to be a string of 1000 characters. Then use a loop to call the prc to insert the rows. The perf is very bad which takes 48 mins if I use SQL server to log on the database server to insert. If I use C# to connect to the server in my desktop (that is what we usually want to do ), it takes about more than 90mins.

Then, I changed the prc to take a table type parameter as the input. I prepared the rows somehow and put them in the table type parameter and do the insert by the following command:

INSERT INTO tableA SELECT * from @tableTypeParameterB

I tried batch size as 1000 rows and 3000 rows (Put 1000-3000 rows in the @tableTypeParameterB to be inserted for one time). The performance is still bad. It takes about 3 mins to insert 1 million rows if I run it in the SQL server and take about 10 mins if I use C# program to connect from my desktop.

The tableA has a clustered index with 2 columns.

My target is to make the insert as fast as possible (My idea target is within 1 min). Is there any way to optimize it?

Just an update:

I tried the Bulk Copy Insert which was suggested by some people below. I tried use the SQLBULKCOPY to insert 1000 row and 10000 row at a time. The performance is still 10 mins to insert 1 million row (Every row has a column with 1000 characters). There is no performance improve. Is there any other suggestions?

An update based on the comments require.

The data is actually coming from UI. The user will change use UI to bulk select, we say, one million rows and change one column from the old value to new value. This operation will be done in a separate procedure.But here what we need to do is that make the mid-tier service to get the old value and new value from the UI and insert them in the table. The old value and new value may be up to 4000 characters and the average is 1000 characters. I think the long string old/new value slow down the speed because when I change the test data old value/new value to 20-50 characters and insert is very fast no matter use SQLBulkCopy or table type variable

I inserted the 1,000,000 million row test data from this puzzle challenge into SQL Serve on my laptop in about 20 seconds using BCP: http://ask.sqlservercentral.com/questions/1227/the-subscription-list-sql-problem.html — Pieter Geerkens, Jul 22 '14 at 01:13
Currently I am just doing example. I read 1000 rows from another table and store it in a datatable object. Then insert them 1000 times to the tableA by command like "Insert into tableA Select @i, Column1, Column2, Column3... ColumnN from @tmpTableTypePrameter" — Mandy, Jul 22 '14 at 01:08
I would have to look at some of my code at work tomorrow but I'm pretty sure I have populated a DataTable with a couple thousand records and then used SqlBulkCopy and only took a few seconds. You don't have to loop through the DataTable. You just have to make sure you have the fields mapped correctly (same name). Will post example of code tomorrow. — Randy R, Jul 22 '14 at 01:48
@Mandy Please add the following info to your question: Is the data you need to insert coming from outside the sql server, or the data that needs to be inserted will be generated from existing data in your sql server? Where is your SQL Server located in relation to your C# application? Are they in the same operational system or same LAN? — Evandro Pomatti, Jul 22 '14 at 02:00
1,000,000 rows with 1000 characters each is 1GB of data. It's going to take a while to get across the network. — Blorgbeard, Jul 22 '14 at 03:06
@Mandy, I like you post you've shared. Here is my problem plz look into this and help me:- I've a table with 16 columns in MSSQL database containg 1.5 crore records and I insert 3.25 Lac average no. of rows on daily basis. Before inserting every new single row, I'm checking with 5 columns values in table. If no row found based on that 5 columns values then can insert new row only. Please suggest me how can I do it in efficient way? I'm reading one by one item from csv file and inserting that row in table Currently it takes 9-10 hours to insert 3.24 lakhs rows. Thanks in advance — dilipkumar1007, Aug 18 '17 at 09:12
@BonanzaOne I've a table with 16 columns in MSSQL database containg 1.5 crore records and I insert 3.25 Lac average no. of rows on daily basis. Before inserting every new single row, I'm checking with 5 columns values in table. If no row found based on that 5 columns values then can insert new row only. Please suggest me how can I do it in efficient way? I'm reading one by one item from csv file and inserting that row in table Currently it takes 9-10 hours to insert 3.24 lakhs rows. Thanks in advance — dilipkumar1007, Aug 18 '17 at 09:15

score 9 · Accepted Answer · edited May 23 '17 at 10:32

9

I think what you are looking for is Bulk Insert if you prefer using SQL.

Or there is also the ADO.NET for Batch Operations option, so you keep the logic in your C# application. This article is also very complete.

Update

Yes I'm afraid bulk insert will only work with imported files (from within the database).

I have an experience in a Java project where we needed to insert millions of rows (data came from outside the application btw).

Database was Oracle, so of course we used the multi-line insert of Oracle. It turned out that the Java batch update was much faster than the multi-valued insert of Oracle (so called "bulk updates").

My suggestion is:

Compare the performance between the multi-value insert of SQL Server code (then you can read from inside your database, a procedure if you like) with the ADO.NET Batch Insert.

If the data you are going to manipulate is coming from outside your application (if it is not already in the database), I would say just go for the ADO.NET Batch Inserts. I think that its your case.

Note: Keep in mind that batch inserts usually operate with the same query. That is what makes them so fast.

edited May 23 '17 at 10:32

Community

1
1

answered Jul 22 '14 at 01:06

Evandro Pomatti

13,341
16
97
165

Bulk Insert can only insert from some local file path location? Actually what we will do in the real life is that user will provide these rows in the UI. Then Mid-tier read the rows and call the procedure to insert the rows. So what I can think about is to use the table typed parameter as the stored procedure input. Then ask Mid-tier service to prepare the parameter and do a batch insert. But the performance is still not good. I googled the bulk insert and it seems it can only insert from a file location and that might not be what I want. – Mandy Jul 22 '14 at 01:13
Bulk Insert (edit: via SqlBulkCopy in .Net code) works with any DataTable, doesn't have to be from a file. See: http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx – Blorgbeard Jul 22 '14 at 01:15
@Blorgbeard I think Mandy was refering to the SQL commands, not the .NET API. – Evandro Pomatti Jul 22 '14 at 01:31
@Evandro you may be right. But SqlBulkCopying the rows from the mid-tier service would be fast, and might be acceptable. – Blorgbeard Jul 22 '14 at 01:34
Do I still need to batch insert? I don't think BulkInsert 1 million row is acceptable since that will cause too much network traffic. I tried to insert 1000 rows for a time by doing the SqlBulkCopy and it was still slow – Mandy Jul 22 '14 at 01:40
@Evandro, if the BulkCopy is fast, I can ask Mid-tier to use this instead of SQL command stored procedure. But it seems still not fast, referring my last commands. – Mandy Jul 22 '14 at 01:42
@Mandy Please add the following info to your question: Is the data you need to insert coming from outside the sql server, or the data that needs to be inserted will be generated from existing data in your sql server? Where is your SQL Server located in relation to your C# application? Are they in the same operational system or same LAN? – Evandro Pomatti Jul 22 '14 at 02:21
Also watch the memory constraints on your mid-tier (ensure your approach doesn't buffer memory). – bryanmac Jul 22 '14 at 02:56
SqlBulkCopy has built-in batching - just set the BatchSize property. Did you use that, or roll your own batching? Getting 1M rows across the network is certainly going to cause some traffic, however you do it. – Blorgbeard Jul 22 '14 at 03:01
@Evandro The user will change use UI to bulk select, we say, one million rows and change one column from the old value to new value. This operation will be done in a separate procedure. What we need to do is make the mid-tier to get the old value and new value from the UI and insert them in the table. The old/new value may be up to 4000 characters and the average is 1000 characters. I think the long string old/new value slow down the speed because when I change the test data old value/new value to 20-50 characters and insert is very fast no matter use SQLBulkCopy or table type variable – Mandy Jul 22 '14 at 05:24
@Blorgbeard In my experiment, I did a very simple thing, selecting 1000 rows from an existing table, putting it in a datatable object. Then use a "for" loop to insert it to the table 1000 times by just manipulating some row to make each row unique. I only calculated the insert time sum and the sum is more than 10 mins. – Mandy Jul 22 '14 at 05:31
Note that with many databases multi-value-insert can bee accelerated by using data compression on the connection. Often the limiting factor is the upload bandwidth of the network connection. I used this once with MyQSL, but in general this should be possible with other databases as well. – sandimschuh Apr 11 '22 at 14:17

score 2 · Answer 2 · edited May 23 '17 at 12:18

Calling a prc in a loop incurs many round trips to SQL.

Not sure what batching approach you used but you should look into table value parameters: Docs are here. You'll want to still batch write.

You'll also want to consider memory on your server. Batching (say 10K at a time) might be a bit slower but might keep memory pressure lower on your server since you're buffering and processing a set at a time.

Table-valued parameters provide an easy way to marshal multiple rows of data from a client application to SQL Server without requiring multiple round trips or special server-side logic for processing the data. You can use table-valued parameters to encapsulate rows of data in a client application and send the data to the server in a single parameterized command. The incoming data rows are stored in a table variable that can then be operated on by using Transact-SQL.

Another option is bulk insert. TVPs benefit from re-use however so it depends on your usage pattern. The first link has a note about comparing:

Using table-valued parameters is comparable to other ways of using set-based variables; however, using table-valued parameters frequently can be faster for large data sets. Compared to bulk operations that have a greater startup cost than table-valued parameters, table-valued parameters perform well for inserting less than 1000 rows.

Table-valued parameters that are reused benefit from temporary table caching. This table caching enables better scalability than equivalent BULK INSERT operations.

Another comparison here: Performance of bcp/BULK INSERT vs. Table-Valued Parameters

score 0 · Answer 3 · answered Jul 22 '14 at 02:01

Here is an example what I've used before with SqlBulkCopy. Grant it I was only dealing with around 10,000 records but it did it inserted them a few seconds after the query ran. My field names were the same so it was pretty easy. You might have to modify the DataTable field names. Hope this helps.

private void UpdateMemberRecords(Int32 memberId)
    {

    string sql = string.Format("select * from Member where mem_id > {0}", memberId);
    try {
        DataTable dt = new DataTable();
        using (SqlDataAdapter da = new SqlDataAdapter(new SqlCommand(sql, _sourceDb))) {
            da.Fill(dt);
        }

        Console.WriteLine("Member Count: {0}", dt.Rows.Count);

        using (SqlBulkCopy sqlBulk = new SqlBulkCopy(ConfigurationManager.AppSettings("DestDb"), SqlBulkCopyOptions.KeepIdentity)) {
            sqlBulk.BulkCopyTimeout = 600;
            sqlBulk.DestinationTableName = "Member";
            sqlBulk.WriteToServer(dt);
        }
    } catch (Exception ex) {
        throw;
    }
}

Someone will change int32 to string (for example, to adopt this code for a guid) and hello SQL injection :) — Alex Zhukovskiy, Feb 08 '16 at 11:38
@Alex, if you can't trust your own programmers, this is the least of your worries. — PRMan, Mar 27 '17 at 13:53

score -2 · Answer 4 · answered Jul 22 '14 at 01:41

-2

If you have SQL2014, then the speed of In-Memory OLTP is amazing; http://msdn.microsoft.com/en-au/library/dn133186.aspx

answered Jul 22 '14 at 01:41

Darren S

920
5
15

score -4 · Answer 5 · answered Jul 22 '14 at 01:11

-4

Depending on your end goal, it may be a good idea to look into Entity Framework (or similar). This abstracts out the SQL so that you don't really have to worry about it in your client application, which is how things should be.

Eventually, you could end up with something like this:

using (DatabaseContext db = new DatabaseContext())
{
    for (int i = 0; i < 1000000; i++)
    {
        db.Table.Add(new Row(){ /* column data goes here */});
    }
    db.SaveChanges();
}

The key part here (and it boils down to a lot of the other answers) is that Entity Framework handles building the actual insert statement and committing it to the database.

In the above code, nothing will actually be sent to the database until SaveChanges is called and then everything is sent.

I can't quite remember where I found it, but there is research around that suggests it is worth while to call SaveChanges every so often. From memory, I think every 1000 entries is a good choice for committing to the database. Committing every entry, compared to every 100 entries, doesn't provide much performance benefit and 10000 takes it past the limit. Don't take my word for that though, the numbers could be wrong. You seem to have a good grasp on the testing side of things though, so have a play around with things.

answered Jul 22 '14 at 01:11

Trent

1,595
15
37

3

Does abstracting away SQL lead to the best performance? How does your code compare to BCP or TVPs? I think you'll find it's less code but not as optimal (the question at hand) – bryanmac Jul 22 '14 at 01:20
@bryanmac I haven't bench-marked the situation, no. I just know from personal experience/research that using `SaveChanges()` outside the loop (or inside with conditions, such as once every 1000 adds) causes significant performance increases over using it inside. I believe, as stated, that EF boils it down to T-SQL, which was mention in your answer. – Trent Jul 22 '14 at 02:41
http://stackoverflow.com/questions/13702756/which-one-is-faster-entity-framework-vs-stored-procedures/13703039#13703039 – bryanmac Jul 22 '14 at 02:46
read 3rd comment from post above. Yes, it generates T-SQL but generic T-SQL vs specifically typed T-SQL which is required for TVPs. – bryanmac Jul 22 '14 at 02:47
Ok, so that's a benchmark calling `SaveChanges()` after 100k `Add()` calls. So 100k records inserted at once. The research I found indicates this is sub-optimal and it's better to do more frequently. I believe re-creating the context is also good to do every so often as it reduces what's stored in the context. I'm not saying this is the best solution, but for maintainability purposes, I would much prefer this option and I think it should get the performance improvement that OP is after. – Trent Jul 22 '14 at 05:08

Fastest way to insert 1 million rows in SQL Server

5 Answers5

Linked