why is inserting a single row x times faster than inserting x rows at once

Question

Possible Duplicate:
Multiple INSERT statements vs. single INSERT with multiple VALUES

Im doing some performance analysis on transactions for batch processing for a blog post and I've noticed that when you use a batch insert statement it performs much slower than the equivalent individual SQL statements.

inserting 1000 rows as below takes approx 3s

INSERT TestEntities (TestDate, TestInt, TestString) VALUES  
('2011-1-1', 11, 'dsxcvzdfdfdfsa'),
('2011-1-1', 11, 'dsxcvzdfdfdfsa'),
('2011-1-1', 11, 'dsxcvzdfdfdfsa')

inserting 1000 rows as below takes 130ms

INSERT TestEntities (TestDate, TestInt, TestString) VALUES ('2011-1-1', 11, 'dsxcvzdfdfdfsa')
INSERT TestEntities (TestDate, TestInt, TestString) VALUES ('2011-1-1', 11, 'dsxcvzdfdfdfsa')
INSERT TestEntities (TestDate, TestInt, TestString) VALUES ('2011-1-1', 11, 'dsxcvzdfdfdfsa')

This only appears to happen on the first time you use a batch insert on the table but its reproducible.

Also note the data im inserting is random (but the same for both queries)

EDIT:

heres my repro case with the dummy random data im using for this case: https://gist.github.com/2489133

@MikaelEriksson yeah this is the same issue ill close this one — undefined, Apr 25 '12 at 11:52
@blam take a look at http://stackoverflow.com/q/8635818/1070291 it explains the issue in much more depth, it also has query plans — undefined, Apr 25 '12 at 12:23

score 3 · Accepted Answer · edited May 23 '17 at 11:44

The issue here according to Multiple INSERT statements vs. single INSERT with multiple VALUES is that when SQL gets the query it has to calculate a query plan on first execution. For a single insert this is nice and quick, as there's not much to calculate, and after it has built the query plan it just re-uses it 1000 times.

in the batch scenario there are 3k variables which need to be built into the query plan which takes much longer to calculate.

One crazy feature which @MartinSmith points out is that there is a magic performance number around a batch size of up to 250 rows which means that the plan calculation is very low.

breaking my above query into 5 200 row statements reduces the execution time to 94ms for 1000 rows

John Dewey · Answer 2 · 2012-04-25T12:11:51.310

1

The first item is one large statement that has to be parsed, and so the extra time spent there lies with the overhead of one large parse job instead of 1000 small ones.

Although I didn't test for all 1000 rows, I did test for 3, and found that the execution plan for the single insert statement is larger. Also note that for 3 separate inserts, there's just one small plan being reused.

enter image description here

edited Apr 25 '12 at 12:11

answered Apr 25 '12 at 11:50

John Dewey

6,985
3
22
26

score 0 · Answer 3 · answered Apr 25 '12 at 11:46

0

The first one is a single statement which is run as a single transaction. The second one is 1000 statements with the overhead of 1000 transactions. The difference should become smaller when you enclose the second one in begin transaction and commit transaction.

answered Apr 25 '12 at 11:46

Andomar

232,371
49
380
404

Both of these are running within transactions but the wierd thing is that the second one is the faster one by a factor of 20. I expect the first to be faster, its a smaller payload and a single statement as you say – undefined Apr 25 '12 at 11:47
Note that he's claiming that the single all-in-one statement runs SLOWER. – Hot Licks Apr 25 '12 at 11:48
1

@HotLicks: The one with `values` has only 1 transaction, which I'd expect to be faster. If both are already in one transaction, like Luke comments, this answer provides no explanation. – Andomar Apr 25 '12 at 11:50
They both have `values`. – Hot Licks Apr 25 '12 at 15:38

why is inserting a single row x times faster than inserting x rows at once

3 Answers3