17

MySQL Docs say :

The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.

Does this mean that for insertion of each new row, the insertion speed will be slowed down by a factor of log N where N, I assume is number of rows? even if I insert all rows in just one query? i.e. :

INSERT INTO mytable VALUES (1,1,1), (2,2,2),  (3,3,3), .... ,(n,n,n)

Where n is ~70,000

I currently have ~1.47 million rows in a table with the following structure :

CREATE TABLE mytable (
   `id` INT,
   `value` MEDIUMINT(5),
   `date` DATE,
   PRIMARY_KEY(`id`,`date`)
) ENGINE = InnoDB

When I insert in the above mentioned fashion in a transaction, the commit time taken is ~275 seconds. How can I optimize this, since new data is to be added everyday and the insert time will just keep on slowing down.

Also, is there anything apart from just queries that might help? maybe some configuration settings?

Possible Method 1 - Removing Indices

I read that removing indices just before insert might help insert speed. And after inserts, I add the index again. But here the only index is primary key, and dropping it won't help much in my opinion. Also, while the primary key is dropped , all the select queries will be crippling slow.

I do not know of any other possible methods.

Edit : Here are a few tests on inserting ~60,000 rows in the table with ~1.47 mil rows:

Using the plain query described above : 146 seconds

Using MySQL's LOAD DATA infile : 145 seconds

Using MySQL's LOAD DATA infile and splitting the csv files as suggested by David Jashi in his answer: 136 seconds for 60 files with 1000 rows each, 136 seconds for 6 files with 10,000 rows each

Removing and re-adding primary key : key removal took 11 seconds, 0.8 seconds for inserting data BUT 153 seconds for re-adding primary key, totally taking ~165 seconds

Community
  • 1
  • 1
Peeyush Kushwaha
  • 3,453
  • 8
  • 35
  • 69
  • 3
    Why do you have 'timestamp' in your primary key? Take it out! – Old Pro Jun 07 '13 at 06:51
  • 1
    @OldPro it was a typo, fixed now – Peeyush Kushwaha Jun 07 '13 at 06:52
  • 1
    But still, isn't the ID unique? If so, then the only possible value to including the date in the index is if you are doing queries which do not return the 'value'. – Old Pro Jun 07 '13 at 06:56
  • 1
    How big is n? Cause there are a lot of other things that takes time besides updating the indexes. Normally around 1000 inserts per transaction is the fastest. Then setting innodb_flush_log_at_trx_commit to 0 or 2 is normally much faster than the default of 1 (make sure to read up on the setting before changing it as it affects ACID compliance) – Andreas Wederbrand Jun 07 '13 at 06:57
  • 1
    n is ~70,000 . i.e. ~70,000 rows are being inserted in a table that already contains 1.47 mil rows – Peeyush Kushwaha Jun 07 '13 at 07:02
  • 1
    is the id unique by itself? – Sebas Jun 07 '13 at 14:33
  • 1
    No, not in this table, since new data is added for each id everyday. – Peeyush Kushwaha Jun 07 '13 at 14:36
  • 1
    Run this: SHOW GLOBAL VARIABLES LIKE 'innodb_log_file_size'; What is the value? – bobwienholt Jun 07 '13 at 15:17
  • 1
    @bobwienholt I get : `innodb_log_file_size` `5242880` – Peeyush Kushwaha Jun 07 '13 at 15:30
  • 1
    I would recommend increasing innodb_log_file_size. The default is 5MB which is ridiculously small. 64MB might be a good number. Follow the instructions here: http://www.mysqlperformanceblog.com/2011/07/09/how-to-change-innodb_log_file_size-safely/ – bobwienholt Jun 07 '13 at 16:07

5 Answers5

26

If you want fast inserts, first thing you need is proper hardware. That assumes sufficient amount of RAM, an SSD instead of mechanical drives and rather powerful CPU.

Since you use InnoDB, what you want is to optimize it since default config is designed for slow and old machines.

Here's a great read about configuring InnoDB

After that, you need to know one thing - and that's how databases do their stuff internally, how hard drives work and so on. I'll simplify the mechanism in the following description:

A transaction is MySQL waiting for the hard drive to confirm that it wrote the data. That's why transactions are slow on mechanical drives, they can do 200-400 input-output operations per second. Translated, that means you can get 200ish insert queries per second using InnoDB on a mechanical drive. Naturally, this is simplified explanation, just to outline what's happening, it's not the full mechanism behind transaction.

Since a query, especially the one corresponding to size of your table, is relatively small in terms of bytes - you're effectively wasting precious IOPS on a single query.

If you wrap multiple queries (100 or 200 or more, there's no exact number, you have to test) in a single transaction and then commit it - you'll instantly achieve more writes per second.

Percona guys are achieving 15k inserts a second on a relatively cheap hardware. Even 5k inserts a second isn't bad. The table such as yours is small, I've done tests on a similar table (3 columns more) and I managed to get to 1 billion records without noticeable issues, using 16gb ram machine with a 240GB SSD (1 drive, no RAID, used for testing purposes).

TL;DR: - follow the link above, configure your server, get an SSD, wrap multiple inserts in 1 transactions and profit. And don't turn indexing off and then on, it's not applicable always, because at some point you will spend processing and IO time to build them.

ryantm
  • 8,217
  • 6
  • 45
  • 57
N.B.
  • 13,688
  • 3
  • 45
  • 55
  • 2
    Your solution of having multiple queries instead of one large query doesn't help. The insert time is the same. Also changing hardware is no answer to this problem. I know what I'm doing and I'm doing it with 8G RAM, 1TB HDD and a rather powerful i3 CPU. Also, I don't understand why would I need a 16G ram for inserting 5Mb data Into a 60Mb table? that is just ridiculous. – Peeyush Kushwaha Jun 07 '13 at 16:23
  • 1
    You use prepared statements. That's only 1 additional line of code. After that you issue the query as if it were a regular one. That makes mysql lex only 1 query, and then it just receives parameters (data) and performs inserts. Also, you having 1TB hard drive and 8 gb ram does not mean that the drive is capable of many IOs nor that you configured your MySQL to use sufficient RAM. As for the relation between 16 gb of ram.. no one said you **must** use it. The link I posted explains the mechanism, that's all. If you prefer to run a big query without indexes - sure, go for it. – N.B. Jun 07 '13 at 16:28
  • 1
    As I said, I did split up the large query into one query per row and also execute them separately but within the same transaction. That didn't make it any faster, and comparing with the large query, there was no difference in execution time at all. – Peeyush Kushwaha Jun 07 '13 at 18:14
  • 5
    Turns out I was wrong. I mistook the default innodb_buffer_pool_size to be 8G when actually it was 8M. so now after increasing it to 5G, the query that was taking over 2 minutes happily inserts well within 2 seconds. – Peeyush Kushwaha Jun 07 '13 at 18:56
  • 2
    I'm glad you tuned your MySQL instance and got promising results without major hardware changes, good luck with further endeavours :) – N.B. Jun 07 '13 at 21:23
6

Dropping index will sure help anyway. Also consider using LOAD DATA. You can find some comparison and benchmarks here

Also, when constructing PRIMARY KEY, use fields, that come first in your table, sequentially, i.e. switch places of second and third fields in structure.

David Jashi
  • 4,490
  • 1
  • 21
  • 26
  • 1
    would writing ~70,000 rows with size of 25 bytes each and then using `LOAD DATA` be faster than inserting directly under a transaction? – Peeyush Kushwaha Jun 07 '13 at 07:06
  • 1
    You are preparing INSERT statements somewhere anyway, aren't you? Writing raw data without SQL overhead is much faster, isn't it? – David Jashi Jun 07 '13 at 07:08
  • 1
    Time taken is almost the same when inserting all rows at once. However, I could save 9 seconds when I split up the csv files as suggested in the blog post. I've updated my question to include some test cases I tried – Peeyush Kushwaha Jun 07 '13 at 15:16
  • 1
    I found this article extremely useful: http://kvz.io/blog/2009/03/31/improve-mysql-insert-performance/ For my INSERTs, LOAD DATA reduces insert time by 65-80% when inserting 2k-10k rows at a time. However, don't build your .csv file row by row in a loop. Instead build an array of all the rows you want to insert, save the array to the .csv file all at once (with the proper format, of course), and then use a LOAD DATA insert query. – Lorien Brune Oct 12 '15 at 16:35
3

If you are doing a bulk insert of a million rows, then dropping the index, doing the insert, and rebuilding the index will probably be faster. However, if your problem is that single row inserts are taking too long then you have other problems (like not enough memory) and dropping the index will not help much.

Old Pro
  • 24,624
  • 7
  • 58
  • 106
  • 1
    I've done some tests, while inserting data is really fast after index removal, rebuilding the index isn't – Peeyush Kushwaha Jun 07 '13 at 14:28
  • 3
    Removing indexes for inserts is one of the worse "methods" of optimizing inserts. Anyone claiming you should do that is just talking out of their ass. Wrapping multiple insert queries into a single transaction is what makes it fly (and if you use transactional engine). – N.B. Jun 07 '13 at 15:30
  • 4
    @N.B. You think the MySQL documentation team is talking out of their asses when [they say](http://dev.mysql.com/doc/refman/5.7/en/load-data.html) "In some extreme cases, you can create the indexes even faster by turning them off with ALTER TABLE ... DISABLE KEYS before loading the file into the table and using ALTER TABLE ... ENABLE KEYS to re-create the indexes after loading the file"? If so, I'd like to see your evidence. – Old Pro Jan 08 '16 at 20:19
1

Building/rebuilding the index is what you're trying to speed up. If you must have this table/key structure, faster hardware and/or tweaking the server configuration to speed up the index build is likely the answer - be sure your server and settings are such that it can be accomplished in memory.

Otherwise, think about making trade-offs with the structure that would improve insert speeds. Alternatively, think about ways you can happily live with a 3 minute insert.

JR Lawhorne
  • 3,192
  • 4
  • 31
  • 41
  • 2
    those 3 minute inserts are a part of much much bigger program :D. Can you tell me how I can speed up index build-up in detail, or what all settings should I tweak? – Peeyush Kushwaha Jun 07 '13 at 15:11
-1

I have found in some cases inserting in transactions in medium chunks can help as it seems to sometimes allow some bulk operations. In other cases it has made things slower presumably due to locks and the overhead of transactions.

jgmjgm
  • 4,240
  • 1
  • 25
  • 18