Which is faster: multiple single INSERTs or one multiple-row INSERT?

Question

I am trying to optimize one part of my code that inserts data into MySQL. Should I chain INSERTs to make one huge multiple-row INSERT or are multiple separate INSERTs faster?

mbarkhau · Accepted Answer · 2018-12-01T18:37:46.813

339

https://dev.mysql.com/doc/refman/8.0/en/insert-optimization.html

The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:

Connecting: (3)

Sending query to server: (2)

Parsing query: (2)

Inserting row: (1 × size of row)

Inserting indexes: (1 × number of indexes)

Closing: (1)

From this it should be obvious, that sending one large statement will save you an overhead of 7 per insert statement, which in further reading the text also says:

If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.

edited Dec 01 '18 at 18:37

answered Nov 24 '09 at 21:57

mbarkhau

8,190
4
30
34

46

How does this answer apply if multiple single INSERTs are within the same database transaction? – Pinch Dec 16 '15 at 18:53
2

How many rows i can insert at a time using single insert statement. is it allow me to insert 10000 rows at a time? – Naresh Ramoliya Apr 30 '16 at 08:02
16

@Pinch Using a transaction while doing ~1.5k upserts (insert/updates) decreased the time the operation took from ~1.5 seconds to ~0.2 seconds. Or in other words, it made it 86% faster compared to single-row inserts. Damn. – fgblomqvist May 24 '16 at 00:25
2

Note: Seems to be much different in MSSQL: http://stackoverflow.com/questions/8635818/multiple-insert-statements-vs-single-insert-with-multiple-values – marsze Jan 16 '17 at 09:50
How about using Prepared Statement for inserting repetitive multiple single inserts? – priyabagus Sep 14 '18 at 11:56

score 182 · Answer 2 · answered Apr 23 '12 at 07:57

I know I'm answering this question almost two and a half years after it was asked, but I just wanted to provide some hard data from a project I'm working on right now that shows that indeed doing multiple VALUE blocks per insert is MUCH faster than sequential single VALUE block INSERT statements.

The code I wrote for this benchmark in C# uses ODBC to read data into memory from an MSSQL data source (~19,000 rows, all are read before any writing commences), and the MySql .NET connector (Mysql.Data.*) stuff to INSERT the data from memory into a table on a MySQL server via prepared statements. It was written in such a way as to allow me to dynamically adjust the number of VALUE blocks per prepared INSERT (ie, insert n rows at a time, where I could adjust the value of n before a run.) I also ran the test multiple times for each n.

Doing single VALUE blocks (eg, 1 row at a time) took 5.7 - 5.9 seconds to run. The other values are as follows:

2 rows at a time: 3.5 - 3.5 seconds
5 rows at a time: 2.2 - 2.2 seconds
10 rows at a time: 1.7 - 1.7 seconds
50 rows at a time: 1.17 - 1.18 seconds
100 rows at a time: 1.1 - 1.4 seconds
500 rows at a time: 1.1 - 1.2 seconds
1000 rows at a time: 1.17 - 1.17 seconds

So yes, even just bundling 2 or 3 writes together provides a dramatic improvement in speed (runtime cut by a factor of n), until you get to somewhere between n = 5 and n = 10, at which point the improvement drops off markedly, and somewhere in the n = 10 to n = 50 range the improvement becomes negligible.

Hope that helps people decide on (a) whether to use the multiprepare idea, and (b) how many VALUE blocks to create per statement (assuming you want to work with data that may be large enough to push the query past the max query size for MySQL, which I believe is 16MB by default in a lot of places, possibly larger or smaller depending on the value of max_allowed_packet set on the server.)

Clarification request: is your time "seconds per row" or "seconds total". — EngrStudent, Sep 24 '15 at 12:23
Seconds total - so seconds per row is that divided by the ~19,000 rows. Though that's a small number, so perhaps rows/second is a better metric if you're looking for an easily comparable number. — Jon Marnock, Sep 28 '15 at 03:49
Incidentally, there's some example .NET code for the approach I describe above on this related answer of mine: https://stackoverflow.com/questions/25377357/insert-data-into-mysql-in-multiple-tables-in-c-sharp-efficiently/32370449#32370449 — Jon Marnock, Jun 29 '18 at 08:02
Note how this shows a 5:1 speedup that hits "diminishing returns" after about 50. My experience shows 10:1 and 100. (close enough for govt work) — Rick James, Mar 30 '21 at 18:29

score 25 · Answer 3 · answered Nov 24 '09 at 22:37

A major factor will be whether you're using a transactional engine and whether you have autocommit on.

Autocommit is on by default and you probably want to leave it on; therefore, each insert that you do does its own transaction. This means that if you do one insert per row, you're going to be committing a transaction for each row.

Assuming a single thread, that means that the server needs to sync some data to disc for EVERY ROW. It needs to wait for the data to reach a persistent storage location (hopefully the battery-backed ram in your RAID controller). This is inherently rather slow and will probably become the limiting factor in these cases.

I'm of course assuming that you're using a transactional engine (usually innodb) AND that you haven't tweaked the settings to reduce durability.

I'm also assuming that you're using a single thread to do these inserts. Using multiple threads muddies things a bit because some versions of MySQL have working group-commit in innodb - this means that multiple threads doing their own commits can share a single write to the transaction log, which is good because it means fewer syncs to persistent storage.

On the other hand, the upshot is, that you REALLY WANT TO USE multi-row inserts.

There is a limit over which it gets counter-productive, but in most cases it's at least 10,000 rows. So if you batch them up to 1,000 rows, you're probably safe.

If you're using MyISAM, there's a whole other load of things, but I'll not bore you with those. Peace.

Is there any reason it gets counter productive after a point? I've seen it happen before too but wasn't sure why. — Dhruv Gairola, Aug 09 '13 at 03:30
Do you know if there is any point at all in batching MySQL inserts when **using transactions**. I'm just wondering if I can save myself the trouble of having to generate the multi-valued SQL command if my underlying library (Java JDBC - mysql-connector-java-5.1.30) is not actually committing until I tell it to. — RTF, Mar 29 '15 at 15:02
@RTF I think you will need to perform a small test to determine that behavior in your situation as it's highly implementation specific behavior, but in many cases yes transactions should provide similar performance gains. — Jasmine Hegman, May 20 '18 at 08:25

score 14 · Answer 4 · answered Jul 23 '21 at 07:57

Here are the results of a little PHP bench I did :

I'm trying to insert 3000 records in 3 different ways, using PHP 8.0, MySQL 8.1 (mysqli)

Multiple insert queries, with multiple transaction :

$start = microtime(true);
for($i = 0; $i < 3000; $i++)
{
    mysqli_query($res, "insert into app__debuglog VALUE (null,now(), 'msg : $i','callstack','user','debug_speed','vars')");
}
$end = microtime(true);
echo "Took " . ($end - $start) . " s\n";

Did it 5 times, average : 11.132s (+/- 0.6s)

Multiple insert queries, with single transaction :

$start = microtime(true);
mysqli_begin_transaction($res, MYSQLI_TRANS_START_READ_WRITE);
for($i = 0; $i < 3000; $i++)
{
    mysqli_query($res, "insert into app__debuglog VALUE (null,now(), 'msg : $i','callstack','user','debug_speed','vars')");
}
mysqli_commit($res);
$end = microtime(true);
echo "Took " . ($end - $start) . " ms\n";

Result with 5 tests : 0.48s (+/- 0.04s)

Single aggregated insert query

$start = microtime(true);

$values = "";

for($i = 0; $i < 3000; $i++)
{
    $values .= "(null,now(), 'msg : $i','callstack','user','debug_speed','vars')";
    if($i !== 2999)
        $values .= ",";
}
mysqli_query($res, "insert into app__debuglog VALUES $values");

$end = microtime(true);
echo "Took " . ($end - $start) . " ms\n";

Result with 5 tests : 0.085s (+/- 0.05s)

So, for a 3000 row insert, looks like :

Using multiple queries in a single write transaction is ~22 times faster than making a multiple queries with multiple transactions for each insert.
Using a single aggregated insert statement is still ~6 times faster than using multiple queries with a single write transaction

Be careful using .= with very large numbers of transactions because the string concatenation will slow down (exponentially?) as the string gets larger. I've seen this mask the performance so that it might seem like individual transactions are actually faster when it's not the case. I use implode() on an array of transactions as part of the insert query. E.g., "insert into tableA (fielda,fieldb) values " . implode(",",arrayInsert) . " on duplicate key ..." — cpopolo, Jun 28 '22 at 16:07

score 11 · Answer 5 · answered Nov 24 '09 at 21:54

11

Send as many inserts across the wire at one time as possible. The actual insert speed should be the same, but you will see performance gains from the reduction of network overhead.

answered Nov 24 '09 at 21:54

RC.

27,409
9
73
93

score 7 · Answer 6 · answered Nov 24 '09 at 21:55

7

In general the less number of calls to the database the better (meaning faster, more efficient), so try to code the inserts in such a way that it minimizes database accesses. Remember, unless your using a connection pool, each databse access has to create a connection, execute the sql, and then tear down the connection. Quite a bit of overhead!

answered Nov 24 '09 at 21:55

ennuikiller

46,381
14
112
137

what if the persistent connection is used? – dusoft Nov 24 '09 at 21:57
7

There is still overhead. Transit time alone (to and from for each separate insert) will quickly be perceptible if you're doing thousands of inserts. – RC. Nov 24 '09 at 22:02

score 5 · Answer 7 · answered Mar 21 '21 at 03:55

I just did a small benchmark and it appears that for a lot of line it's not faster. Here my result to insert 280 000 rows :

by 10 000 : 164.96 seconds
by 5 000 : 37seconds
by 1000 : 12.56 seconds
by 600 : 12.59 seconds
by 500 : 13.81 seconds
by 250 : 17.96 seconds
by 400 : 14.75 seconds
by 100 : 27seconds

It appears that 1000 by 1000 is the best choice.

score 4 · Answer 8 · edited Dec 31 '12 at 08:35

4

You might want to :

Check that auto-commit is off
Open Connection
Send multiple batches of inserts in a single transaction (size of about 4000-10000 rows ? you see)
Close connection

Depending on how well your server scales (its definitively ok with PostgreSQl, Oracle and MSSQL), do the thing above with multiple threads and multiple connections.

edited Dec 31 '12 at 08:35

Praveen Vinny

2,372
6
32
40

answered Jan 30 '10 at 18:41

Antibarbie

41
1

Chris Williams · Answer 9 · 2009-11-24T22:01:09.503

In general, multiple inserts will be slower because of the connection overhead. Doing multiple inserts at once will reduce the cost of overhead per insert.

Depending on which language you are using, you can possibly create a batch in your programming/scripting language before going to the db and add each insert to the batch. Then you would be able to execute a large batch using one connect operation. Here's an example in Java.

score 3 · Answer 10 · answered Aug 10 '17 at 11:54

MYSQL 5.5 One sql insert statement took ~300 to ~450ms. while the below stats is for inline multiple insert statments.

(25492 row(s) affected)
Execution Time : 00:00:03:343
Transfer Time  : 00:00:00:000
Total Time     : 00:00:03:343

I would say inline is way to go :)

score 1 · Answer 11 · answered Jul 21 '18 at 16:16

1

It's ridiculous how bad Mysql and MariaDB are optimized when it comes to inserts. I tested mysql 5.7 and mariadb 10.3, no real difference on those.

I've tested this on a server with NVME disks, 70,000 IOPS, 1.1 GB/sec seq throughput and that's possible full duplex (read and write).
The server is a high performance server as well.
Gave it 20 GB of ram.
The database completely empty.

The speed I receive was 5000 inserts per second when doing multi row inserts (tried it with 1MB up to 10MB chunks of data)

Now the clue:
If I add another thread and insert into the SAME tables I suddenly have 2x5000 /sec. One more thread and I have 15000 total /sec

Consider this: When doing ONE thread inserts it means you can sequentially write to the disk (with exceptions to indexes). When using threads you actually degrade the possible performance because it now needs to do a lot more random accesses. But reality check shows mysql is so badly optimized that threads help a lot.

The real performance possible with such a server is probably millions per second, the CPU is idle the disk is idle.
The reason is quite clearly that mariadb just as mysql has internal delays.

answered Jul 21 '18 at 16:16

John

7,507
3
52
52

@Craftables you need external development, it can not be done within mysql. Threads means that you use multiple connections to the server, you split up the query into multiple chunks (for example by splitting it into even parts by primary key). I managed to get up to 10,000 times the performance using this method on very large tables. Queries that would run for 40,000 seconds can finish in 2-3 minutes IF you use multiple threads and your mysql is highly optimized. – John Feb 10 '19 at 16:34
@John Interesting and may have some real nice applications... but... If you split query into multiple chunks how do you handle transactions? And also consider the following scenario: Table x has a 'parent_id' column that relates to same table 'id'. Somewhere inside your data you have INSERT INTO x (`id`, `parent_id`) VALUES (1, NULL) . One of the next sets of values links to that row. If you split in chunks and that set gets to another chunk, it may be processed before the first one, failing the whole process. Any idea how to deal with that? – zozo May 12 '19 at 22:47
@zozo this is useful for bulk inserts and bulk queries. Transactions would ruin the performance anyway as they include plenty of data buffering. But you can also use transactions in multi threaded inserts or queries. – John May 30 '19 at 02:51

score 0 · Answer 12 · answered May 12 '21 at 11:51

0

I would add the information that too many rows at a time depending on their contents could lead to Got a packet bigger than 'max_allowed_packet'.

Maybe consider using functions like PHP's array_chunk to do multiple inserts for your big datasets.

answered May 12 '21 at 11:51

William Desportes

1,412
1
22
31

MSS · Answer 13 · 2019-05-13T03:02:26.760

-3

multiple inserts are faster but it has thredshould. another thrik is disabling constrains checks temprorary make inserts much much faster. It dosn't matter your table has it or not. For example test disabling foreign keys and enjoy the speed:

SET FOREIGN_KEY_CHECKS=0;

offcourse you should turn it back on after inserts by:

SET FOREIGN_KEY_CHECKS=1;

this is common way to inserting huge data. the data integridity may break so you shoud care of that before disabling foreign key checks.

edited May 13 '19 at 03:02

answered May 10 '17 at 14:51

MSS

3,520
24
29

1

No idea why ppl upvoted this for two reasons: 1. It has nothing to do with the question 2. It is a really bad idea (with a few exceptions - like dumping or structural temp changes -, but bad in general). The checks are there for a reason: They are there to ensure data consistency. They slow things down is because they ensure you don't insert or otherwise change data that you should not. Try to optimize queries the right way; in any business critical environment this would mean death of the app since regardless on how careful you are things will fail at some point. – zozo May 12 '19 at 22:34
1

maybe but this option is extremly effective in importing big tables and very practical and it could give some people an idea how they can make data insertion much faster. – MSS May 13 '19 at 03:07

Which is faster: multiple single INSERTs or one multiple-row INSERT?

13 Answers13

Multiple insert queries, with multiple transaction :

Multiple insert queries, with single transaction :

Single aggregated insert query

Linked

Related