43

I'm trying to speed up bulk insert in an InnoDB table by temporary disabling its indexes:

ALTER TABLE mytable DISABLE KEYS;

But it gives a warning:

+-------+------+-------------------------------------------------------------+
| Level | Code | Message                                                     |
+-------+------+-------------------------------------------------------------+
| Note  | 1031 | Table storage engine for 'mytable' doesn't have this option |
+-------+------+-------------------------------------------------------------+
1 row in set (0.00 sec)

How can we disable the indexes?

What alternatives are there to avoid using the index when doing bulk inserts?

How can we speed up the process?

Pacerier
  • 86,231
  • 106
  • 366
  • 634
fanchyna
  • 2,623
  • 7
  • 36
  • 38

4 Answers4

39

Have you tried the following?

    SET autocommit=0; 
    SET unique_checks=0; 
    SET foreign_key_checks=0;

From the MySQL References https://dev.mysql.com/doc/refman/8.0/en/optimizing-innodb-bulk-data-loading.html

See Section "Bulk Data Loading Tips"

O. Jones
  • 103,626
  • 17
  • 118
  • 172
lurkerbelow
  • 709
  • 6
  • 13
  • 1
    I tried to do that. But the point is that how to know if the index is abled or disabled? – fanchyna Mar 07 '12 at 18:14
  • 1
    I know that the "EXPLAIN" statement can help to understand the "SELECT" statement but what about the 'LOAD DATA LOCAL INFILE" statement? How do I know if index is abled when running this command? Thanks. – fanchyna Mar 07 '12 at 19:03
  • @fanchyna, Disable/enable keys is a global setting, not per session. Use [`show keys in my_table`](http://stackoverflow.com/questions/4980917/mysql-disable-keys/28667994#28667994) to see if the indexes are being disabled or enabled. – Pacerier Feb 23 '15 at 14:39
  • Turning those off could cause trouble later. – Rick James Jun 10 '21 at 06:36
37

There is a very good reason why you cannot execute DISABLE KEYS on an InnoDB table; InnoDB is not designed to use it, and MyISAM is.

In fact, here is what happens when you reload a mysqldump:

You will see a CREATE TABLE for a MyISAM table following by a write lock.

Before all the bulk inserts are run, a call to ALTER TABLE ... DISABLE KEYS is done.

What this does is turn off secondary indexes in the MyISAM table.

Then, bulk inserts are done. While this is being done, the PRIMARY KEY and all UNIQUE KEYS in the MyISAM table are being rebuilt. Before the UNLOCK TABLEs, a call ALTER TABLE ... ENABLE KEYS is done in order to rebuild all non-unique indexes linearly.

IMHO this operation was not coded into the InnoDB Storage Engine because all keys in a non-unique index come with the primary key entry from gen_clust_index (aka Clustered Index). That would be a very expensive operation since building a non-unique index would require O(n log n) running time to retrieve each unique key to attach to a non-unique key.

In light of this, posting a warning about trying to DISABLE KEYS/ENABLE KEYS on an InnoDB table is far easier than coding exceptions to the mysqldump for any special cases involving non-MyISAM storage engines.

RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
  • 3
    Thank you for your explanation, but the question still remains: how can I avoid using index when uploading data so as to reduce the amount of importing time.Thanks. – fanchyna Mar 07 '12 at 18:17
  • 1
    What @RolandoMySQLDBA means is that you won't appreciably reduce loading time by disabling indices on an INNODB database. Clustered indices are tree-based (= log timed) data structures in their own right; your choices are building the whole index at nlogn, or running n logn inserts. You might save a bit in per-insert costs but in practise I bet it's very little; there are sequential insert optimizations already in place. – Mark McKenna May 28 '13 at 12:32
  • 3
    I regularly build very large `innodb` tables with lots of indexes (hundreds of gigabytes / 10+ different indexes). There is a huge reduction in creation time when I `INSERT` into tables with no indexes. Obviously, "huge" depends greatly on the ultimate size of your table, the number and type of indexes, the types of data elements, etc. I usually build the table with nothing but the primary key (not needed if you're just loading data that already has a primary key id), and then build the indexes at the end. You may not notice a difference at 5MM rows and 1.5GB, but as your tables grow you will. – T. Brian Jones Apr 21 '17 at 21:44
  • Only remove all secondary indexes, so not the Primary Key. Then insert the new data in perfect PK-order. Use a separate (parallel) insert per partition, where feasible. Then add back all indexes in a single ALTER TABLE statement, that includes the phrase ALGORITHM=INPLACE to avoid rewriting the data. – druid62 Dec 30 '20 at 12:30
21

A little late but... whatever... forget all the answers here, don't disable the indexes, there's no way, just drop them ALTER TABLE tablename DROP INDEX whatever, bulk insert the data, then ALTER TABLE tablename ADD INDEX whatever (whatever); the amount of time recreating the indexes is 1% of the bulk insert with indexes on it, like 400000 rows took 10 minutes with indexes and like 2 seconds without them..., cheers...

KciNicK
  • 221
  • 2
  • 2
  • 1
    good point! who ever downgraded this post. i would like to know why. – f b Nov 13 '18 at 20:45
  • 1
    What about the time taken to recreate the index after the insert? I would assume the indexes are there for a reason, so dropping them, inserting the records, then creating back the same index, all takes time. Taking into consideration the time taken to drop and create the indexes, is it still faster? – iWantSimpleLife Mar 21 '19 at 04:05
  • When you create an index, it is surprisingly fast. It probably sorts the index in memory before writing it to the database, then it can write the entire index sequentially, rather than having to insert records into random areas of the index while creating each record for the main table. I've got a data-import utility, and dropping the indexes before I import,then re-creating them is SO much faster. It's probably more efficiently stored, as the page-splits will be in the optimum position. – Rich S Jul 07 '20 at 13:25
  • 1
    The overhead depends on `innodb_buffer_pool_size`. See also "change buffering". – Rick James Jun 10 '21 at 06:37
  • 2
    Just remember that with the indexes dropped, *any constraints on the keys will not be enforced*, so the subsequent `CREATE INDEX` might well fail. – LSerni Nov 02 '21 at 18:39
9

to reduce the costs for re-calculating the indexes you should insert the data either using DATA INFILE or using Mysql Multi Row Inserts, like

INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

-> so inserting several rows with one statement.

How many rows one can insert with one statement depends on the max_allowed_packet mysql setting.

staabm
  • 1,535
  • 22
  • 20
  • 2
    Are you sure about this? So are you telling us to insert a million rows using a **single** insert statement? – Pacerier Feb 23 '15 at 14:47
  • 2
    I dont tell you to insert "all the data" with one statement, but to insert several rows at once. In my apps I have used 500-1000 rows per insert without issues. It mostly depends on the number of rows and the amount of data you insert per row. – staabm Feb 23 '15 at 16:15
  • So what's a good number? Do you decide that by trial and error? – Pacerier Feb 24 '15 at 00:21
  • Its just like any other setting. the more rows you insert with one statement the faster it gets. having too much rows will make php run into a error regarding a too large statement. You need to fiddle and monitor what works best for your app. – staabm Feb 24 '15 at 11:14
  • So you are saying we should set the limit to unlimited and then run the whole thing in **one** command? – Pacerier Mar 02 '15 at 03:40
  • 4
    see my previous comment. I expecitly didn't say to do things in one statement. if your server has a unlimited amount of memory - you can use unlimited huge/long/big statements – staabm Mar 02 '15 at 08:28
  • It is [not](http://dev.mysql.com/doc/refman/5.7/en/packet-too-large.html) unlimited – Drew Jun 21 '16 at 22:39
  • 1
    @Drew thats what I say. I will update the answer anyway, as the link you posted is usefull. – staabm Jul 08 '16 at 07:42
  • Changing from individual `INSERT`s to multiple rows with 1 statement reduced the upload time from 10 minutes to 10 secs. – Adam Sep 07 '17 at 15:33
  • 100-1000 is a good number for batch inserts. Don't worry about finding the optimal number, there is very little difference across that range. (1 million rows is _very_ likely to hit some limit, even if you have a huge RAM.) – Rick James Jun 10 '21 at 06:39