11

I'm trying to bulk load around 12m records into a InnoDB table in a (local) mysql using LOAD DATA INFILE (from CSV) and finding it's taking a very long time to complete.

The primary key type is UUID and the keys are unsorted in the data files.

I've split the data file into files containing 100000 records and import it as:

mysql -e 'ALTER TABLE customer DISABLE KEYS;'
for file in *.csv
    mysql -e "SET sql_log_bin=0;SET FOREIGN_KEY_CHECKS=0; SET UNIQUE_CHECKS=0;
    SET AUTOCOMMIT=0;LOAD DATA INFILE '${file}' INTO TABLE table 
    FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';  COMMIT"

This works fine for the first few hundred thousand records but then the insert time for each subsequent load seems to keep growing (from around 7 seconds to around 2 minutes per load before I killed it.)

I'm running on a machine with 8GB RAM and have set the InnoDB parameters to:

innodb_buffer_pool_size =1024M
innodb_additional_mem_pool_size =512M
innodb_log_file_size = 256M
innodb_log_buffer_size = 256M

I've also tried loading a single CSV containing all rows with no luck - this ran in excess of 2 hours before I killed it.

Is there anything else that could speed this up as this seems like an excessive time to only load 12m records?

Michael
  • 210
  • 2
  • 7
  • [DISABLE KEYS doesn't appear to work in InnoDB](http://serverfault.com/questions/291100/with-mysql-how-long-does-an-alter-table-disable-keys-statement-last), the answers about dropping and adding secondary indexes are good. – KCD Mar 26 '12 at 05:49
  • Can't disable the `PRIMARY KEY`. Or, if you could, the "repair" would cost an awfully long time. – Rick James Sep 12 '16 at 19:06

2 Answers2

7

If you know the data is "clean", then you can drop indexes on the affected tables prior to the import and then re-add them after it is complete.

Otherwise, each record causes an index-recalc, and if you have a bunch of indexes, this can REALLY slow things down.

cdeszaq
  • 30,869
  • 25
  • 117
  • 173
  • In this case it will not help because the offending index is the primary key. For innodb the primary key is implemented as clustered index which stores the data and the primary key index in the same structure. – Assaf Karmon Jan 09 '12 at 15:32
  • Since the OP didn't indicate, and it is generally a good idea to have alternate indexes on any table that stores non-trivial information (and a lot of it), I assumed that there are other indexes present. Dropping those extra indexes _will_ help the insert speed, since it will need to do _fewer_ calculations. I agree that the primary key's index is likely to be causing trouble as well, but removing indexes will still speed up the insert. – cdeszaq Jan 09 '12 at 16:21
  • yes the table had indexes on 4 other columns. dropping these did improve performance but only delayed the point where it essentially grinds to a halt when doing the inserts. – Michael Jan 10 '12 at 08:56
  • It's always possible to turn indexes off before the `LOAD DATA INFILE` query, and then turn them back on after your done with the query. This way you only update the indexes once instead of x times and your `LOAD DATA INFILE` executes a lot faster. – srchulo Oct 17 '12 at 23:33
2

Its always hard to tell what is the cause of performance issues but these are my 2 cents: Your key being a uuid is randomly distributed which makes it hard to maintain an index. The reason being that keys are stored by range in a file system block, so having random uuids follow each other makes the OS read and write blocks to the file system without leveraging the cache. I don't know if you can change the key, but you could maybe sort the uuids in the input file and see if that helps. FYI, to understand this issue better I would take a look at this blog post and maybe read this book mysql high performance it has a nice chapter about innodb clustered index. Good Luck!

Assaf Karmon
  • 915
  • 1
  • 10
  • 23
  • yes it seems the UUID based primary key was causing most of the problems - changing it in the data file to something that is just a sorted string sequence gave a speed up of around 6. – Michael Jan 10 '12 at 17:56
  • `innodb_buffer_pool_size = 5G` would also help. – Rick James Sep 12 '16 at 19:07
  • More discussion on [_the evils of indexing UUIDs_](https://mariadb.com/kb/en/guiduuid-performance/). And it includes a workaround _if_ they are "type 1". – Rick James Sep 12 '16 at 19:08