0

I'm doing some performance tests on a mysql database. All of the tables use the InnoDB engine, the mysql config is almost the default one, no special flags set.

I have a test.sql file with 64 000 SQL INSERTs, which are declared in the INSERT INTO TableName(column1, column2) VALUES (value1, value2) format.

I run this sql file on the mysql client by executing the following line in the command prompt: mysql -u username -ppassword db_name < test.sql

When doing this, the execution of the file is done in a minute. In the next step I broke the 64 000 statements into two files, each with 32 000 rows. In a .bat file I open two mysql clients simultaneously, one running with the file containing the first part, the other one with the other file. Execution time drops to 34 seconds.

So I did this a couple more times, and the times drop drastically. The measurements are done by adding a timestamp to each row, and subtracting the min. timestamp from the max. timestamp.

  • 1 - 58 sec
  • 2 - 34 sec
  • 4 - 18.5 sec
  • 8 - 14.68 sec
  • 16 - 9.79 sec
  • 32 - 9.14 sec

Before starting the measurements, my guess was that running any number of files, if the total count stays the same, the overall time would almost be the same in all cases. I'm not really interested in speeding up the whole import, I'm rather curious what makes the times drop so much?

The tests are run on a machine powered by this Xeon processor.

Daniel Zolnai
  • 16,487
  • 7
  • 59
  • 71
  • Real answer is efficient hard disk usage. Running more threads simply used the drive's I/O and bandwith to its maximum potential, along with more efficient CPU use to lex and execute the queries. – N.B. Nov 16 '14 at 18:32
  • it is not true, becuase on most real tables(especially innodb engine) disk will be used in parallel even if one thread inserts. Mysql engine now not so simple thing. – arheops Nov 16 '14 at 18:39
  • @N.B.: There's is some thruth in it, but it gives just a small performance increase. I made this picture: https://dl.dropboxusercontent.com/u/52042554/factors.png The first part is inserting 32 000 rows in one client. The second part (the higher one is inserting 64 000 rows in 32 clients). There is a ~2x speed increase, but not a 8x. – Daniel Zolnai Nov 16 '14 at 19:07
  • That is normal. There are almost no algorithms which parallels 8x if you do 8x threads. Any IT univirsity have multithreading or similar course at 3-4 year of study, you can find some in web and go via it. More threads = less aditional perfomance on any new one. Especialy when you consistency(locks/one table) – arheops Nov 16 '14 at 20:00
  • I know that that is normal, I just explained that this answer alone does not answer the whole increase in speed. – Daniel Zolnai Nov 16 '14 at 20:04
  • @arheops - actually, it is true. The disk was used more efficiently. Sad truth is that one can get better insert performance using 1 thread and optimizing the number of insert queries within a transaction. Doing so will avoid context switches and the need to deal with concurrency. Threads haven't really helped anything here, the whole operation can be even faster without threading. – N.B. Nov 17 '14 at 08:44

2 Answers2

1

It can be faster because each thread lock different part of index.

Try following

echo "begin;" >/tmp/begin
echo "commit;">/tmp/commit

cat /tmp/begin test.sql /tmp/commit |mysql -u username -ppassword db_name 

Have be even faster(one lock at start)

arheops
  • 15,544
  • 1
  • 21
  • 27
  • Running with autocommit turned off reduces 58 secs to 21 seconds. Can you tell me more about this index locking you mention? I know the basics of autocommit, and why this makes a speed increase, but when running on two files, there are still 64k commits (after each insert), so I don't really get it, why that does still use less time. My guess is if I turn autocommit off for each part, I just speed the whole process up by a lot, instead of solving the "mystery" – Daniel Zolnai Nov 16 '14 at 19:33
  • if your tables use index(at least one primary always exists even if you not specified), before inserting you need start transaction, lock table, lock index section, insert, do all unlocks and commit. On commit you have actualy rebuild index. As result your sql do alot of not needed operation, if you use transaction(begin/commit) you have less overhead. In mysql above 5.5 you can control how many sections each index have. Also you can control cache size for index/table. – arheops Nov 16 '14 at 19:49
  • You can speedup even more if you use transaction section in each file and run in 4-8 threads(depend of innodb settings). For example try do that in packet of 30 insert, after that "commit;begin"; – arheops Nov 16 '14 at 19:50
  • 1
    You need know alot to fully understand operations which will be done behind your "simpler insert". For get first aproximation in understanding you can consider articles http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-performance-change_buffering.html http://mysqlha.blogspot.com/2008/12/innodb-insert-performance.html http://stackoverflow.com/questions/9819271/why-is-mysql-innodb-insert-so-slow – arheops Nov 16 '14 at 19:55
  • Thank you, I will definitely look into those articles. – Daniel Zolnai Nov 16 '14 at 20:04
0

This will probably be due to threading, and will be more visible on multi-core systems. Processing things in parallel will be faster than doing one at a time.

Seer
  • 5,226
  • 5
  • 33
  • 55
  • And why wouldn't the mysql client do this by default to speed the whole process up? Also, the performance gain is even visible when switching from 16 to 32. When dealing with such high numbers, the threading gain is lost, and execution times would be even slower caused by the massive context switches. – Daniel Zolnai Nov 16 '14 at 18:59
  • Mysql client can't do that by default because MOST of sql operation will do different tables based on ORDER of sql. Client can't guess that. There are no massive switching, mysql deamon will do limit switching by itself. – arheops Nov 16 '14 at 19:57