1

I've got 512 rows to insert to a database. I'm wondering if there is any advantage to submitting multiple inserts over one large insert. For example

1x 512 row insert --

INSERT INTO mydb.mytable (id, phonenumber)
VALUES (1, 555-555-5555) , (2, 555-555-5555) , (3, 555-555-5555), //repeat to id = 512

VS 4x 128 row insert

INSERT INTO mydb.mytable (id, phonenumber)
VALUES (1, 555-555-5555) , (2, 555-555-5555) , (3, 555-555-5555), //repeat to id = 128
INSERT INTO mydb.mytable (id, phonenumber)
VALUES (129, 555-555-5555) , (130, 555-555-5555) , (131, 555-555-5555), //repeat to id = 256, then next 128, then next 128.

VS 512x 1 row insert

INSERT INTO mydb.mytable (id, phonenumber)
VALUES (1, 555-555-5555)
INSERT INTO mydb.mytable (id, phonenumber)
VALUES (2, 555-555-5555) // repeat until id = 512

And a question about testing this, if I were to set this up as a test - Say I do the first approach, one large 512 row insert. That takes 0.5 seconds. Then the next time it takes 0.3 seconds -- does this caching that I think will happen, like it does when a programming language performs the same action twice, happen in sql? (Thusly would it be neccesary to get a good average of test results for each approach?)

What other considerations should I take when doing extremely large inserts (say, half a million)? Is it true that if the packet sent to the database is too large it will never recieve or execute the query -- Will I even run into trouble ever for making a very large insert?

J-Dizzle
  • 3,176
  • 6
  • 31
  • 49

3 Answers3

5

My answer is assuming SQL Server; I suspect what I say would be applicable to other SQL engines.

Much of the overhead of any SQL Server query is development of an execution plan. If you do this as a single insert, it will have to develop the execution plan one (1) time; if you do 512 seperate inserts, it will have to develop the execution plan 512 times. So it is considerably less overhead to do a single insert.

I wouldn't be surprised to find that the engine finds other efficiencies that you either wouldn't know to do, wouldn't think to do, or wouldn't be able to do. But if it was only the execution plan savings, it's still worth doing in a single insert.

Joe Baker
  • 186
  • 1
  • 1
  • 11
  • Oh... and for a half-million inserts, the answer is the same except more so. You may find it logistically necessary to break it up some (at some point memory could become an issue); but the less you break it up, in general, the better. – Joe Baker Oct 07 '14 at 19:57
  • On this developing execution plan topic: My knowledge of execution plans is limited, but, in doing 512 1 row inserts, would the execution plan be the same? If so, this would lead me to believe it would be cached, and a test of this would require a good set of averages since the inserts would take less and less time – J-Dizzle Oct 07 '14 at 20:00
  • One final thought -- are you doing this from Query Manager, or from code? If from code, then there's a much better way to approach it, using SqlBulkCopy. – Joe Baker Oct 07 '14 at 20:01
  • I've never seen any evidence that it caches the execution plans; I try to use Stored Procedures wherever possible, because it WILL generate and store the execution plan (possibly a solution you'd be interested in). – Joe Baker Oct 07 '14 at 20:03
  • 1
    What I'm actually doing would be through PHP and you can talk on this if you'd like, but the question was mostly about the db-engine itself.is SqlBulkCopy a SQL server only thing? I suppose a stored procedure with variables is pretty much the best way to go, in an engine like mysql. I'm glad to have the first question you have answered, and you answered it well – J-Dizzle Oct 07 '14 at 20:06
  • I'm sorry, I have no experience with PHP. SqlBulkCopy is a .NET Framework thing -- I believe it works with other SQL engines, but it wouldn't be available in PHP. Maybe someone else knows if PHP has something similar? – Joe Baker Oct 07 '14 at 20:09
  • And thanks for your kind words! I'm glad to be able to offer some help; I've certainly TAKEN a lot from this site. :) – Joe Baker Oct 07 '14 at 20:10
  • Perfectly fine, the scope of this question was intended to be only sql-engines anyway. I'll figure out the PHP, no problem! – J-Dizzle Oct 07 '14 at 20:13
4

The answer is likely to vary based on which RDBMS product you're using. One can't make a fine-grained optimization plan in an implementation-agnostic way.

But you can make broad observations, for example it's better to remove loop-invariant code.

In the case of a loop of many INSERTs to the same table, you can make an educated guess that the loop invariants are things like SQL parsing and query execution planning. Some optimizer implementations may cache the query execution plan, some other implementations don't.

So we can assume that a single INSERT of 512 rows is likely to be more efficient. Again, your mileage may vary in a given implementation.

As for loading millions of rows, you should really consider bulk-loading tools. Most RDBMS brands have their own special tools or non-standard SQL statements to provide efficient bulk-loading, and this can be faster than any INSERT-based solution by an order of magnitude.

So you have just wasted your time worrying about whether a single INSERT is a little bit more efficient than multiple INSERTs.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Though this would be the fastest, with my actual situation this seems like swatting a fly with a sledgehammer -- as I'm closer to 10,000 rows and it doesn't actually need this level of effeciency -- and I'd have to refactor too much to implement such a solution, in a project that is allllllllmost done. Next time I have a project with larger data ETL requirements, I will keep these non-insert solutions in mind. – J-Dizzle Oct 07 '14 at 20:44
0

For many databases indexing is an overhead. It is worth testing to see if it is faster to turn off indexing before doing a large insert and then re-index the table afterwards.

rossum
  • 15,344
  • 1
  • 24
  • 38