43

Adding a new column or adding a new index can take hours and days for large innodb tables in MySQL with more than 10 million rows. What is the best way to increase the performance on large innodb tables in these two cases? More memory, tweaking the configuration (for example increasing the sort_buffer_size or innodb_buffer_pool_size), or some kind of trick? Instead of altering a table directly, one could create a new one, change it, and copy the old data the new, like this which is useful for ISAM tables and multiple changes:

CREATE TABLE tablename_tmp LIKE tablename;
ALTER TABLE tablename_tmp ADD fieldname fieldtype;
INSERT INTO tablename_tmp SELECT * FROM tablename;
ALTER TABLE tablename RENAME tablename_old;
ALTER TABLE tablename_tmp RENAME tablename;

Is it recommendable for innodb tables, too, or is it just what the ALTER TABLE command does anway?

0x4a6f4672
  • 27,297
  • 17
  • 103
  • 140
  • 1
    altering a table (especially a large one) will make mysql use a temp file to copy everything across with the new fields in place. either way, you'll be copying all the data anyways, so your method just adds more steps. – Marc B Jul 12 '12 at 10:42
  • 1
    you might want to check out this 6 minute/10 million row example here http://stackoverflow.com/questions/5302419/mysql-slow-query/5303805#5303805 – Jon Black Jul 19 '12 at 15:55

2 Answers2

72

Edit 2016: we've recently (August 2016) released gh-ost, modifying my answer to reflect it.

Today there are several tools which allow you to do online alter table for MySQL. These are:

Let's consider the "normal" `ALTER TABLE`:

A large table will take long time to ALTER. innodb_buffer_pool_size is important, and so are other variables, but on very large table they are all negligible. It just takes time.

What MySQL does to ALTER a table is to create a new table with new format, copy all rows, then switch over. During this time the table is completely locked.

Consider your own suggestion:

It will most probably perform worst of all options. Why is that? Because you're using an InnoDB table, the INSERT INTO tablename_tmp SELECT * FROM tablename makes for a transaction. a huge transaction. It will create even more load than the normal ALTER TABLE.

Moreover, you will have to shut down your application at that time so that it does not write (INSERT, DELETE, UPDATE) to your table. If it does - your whole transaction is pointless.

What the online tools provide

The tools do not all work alike. However, the basics are shared:

  • They create a "shadow" table with altered schema
  • They create and use triggers to propagate changes from original table to ghost table
  • They slowly copy all the rows from your table to shadow table. They do so in chunks: say, 1,000 rows at a time.
  • They do all the above while you are still able to access and manipulate the original table.
  • When satisfied, they swap the two, using a RENAME.

The openark-kit tool has been in use for 3.5 years now. The Percona tool is a few months old, but possibly more tested then the former. Facebook's tool is said to work well for Facebook, but does not provide with a general solution to the average user. I haven't used it myself.

Edit 2016: gh-ost is a triggerless solution, which significantly reduces master write-load on the master, decoupling the migration write load from the normal load. It is auditable, controllable, testable. We've developed it internally at GitHub and released it as open source; we're doing all our production migrations via gh-ost today. See more here.

Each tool has its own limitations, look closely at documentation.

The conservative way

The conservative way is to use an Active-Passive Master-Master replication, do the ALTER on the standby (passive) server, then switch roles and do the ALTER again on what used to be the active server, now turned passive. This is also a good option, but requires an additional server, and deeper knowledge of replication.

Shlomi Noach
  • 9,073
  • 1
  • 23
  • 20
  • The mentioned tools seem to be helpful. The innodb_buffer_pool_size parameter apparently is useful to avoid MySQL errors like MySQL Error 1206 ("The total number of locks exceeds the lock table size") for operations on large tables, but I am not sure if it has an impact on performance. Is it helpful to lock tables before large restructuring operations with "FLUSH TABLES WITH READ LOCK" and unlock them afterwards with "UNLOCK TABLES" ? – 0x4a6f4672 Jul 13 '12 at 06:50
  • The `innodb_buffer_pool_size` parameter is one of two critical parameters for InnoDB: it is the memory allocation for data+index cache. It has a huge impact on performance, regardless of the issue discussed here. If you're OK with locking, then why not issue the original `ALTER TABLE`? If you can afford the lockdown, go ahead and use the build in offer. – Shlomi Noach Jul 13 '12 at 08:29
  • I highly suggest openark. The alter tool is phenomenal. You get live progress of the data transfer in the terminal and no downtime. It sure beats executing a command and just knowing that it's going to run for who knows how long. My alter command to change a column datatype took roughly 2 hours for 100 million rows and multiple indices on Amazon RDS (standard disk, no PiOPS). – jaredstenquist Nov 27 '12 at 16:39
  • One important drawback on all these tools: they need to create their own triggers, so if you already have triggers created, it will fail since MySQL cannot create 2 triggers for same action on one table ;(. My work around to minimize disk time: use SSD disks :D – Cedric Simon Apr 02 '14 at 01:17
  • @CedricSimon. True; however this applies to AFTER triggers only. Also, I've noticed in many cases an AFTER trigger can be changed to a BEFORE trigger without code rewrite. – Shlomi Noach Apr 02 '14 at 04:22
  • In some conditions, create triggers in a large rows table can block the table for a considerable time. I had that experience with a table with 20 millions rows, and the table was locked for 3 minutes. https://dba.stackexchange.com/questions/140997/why-does-create-trigger-lock-a-table-in-mysql – Martin Da Rosa Jun 29 '17 at 20:47
0

Rename screws up referenced tables.

If you have say table_2 which is child to tablename, on ALTER TABLE tablename RENAME tablename_old; table_2 will start pointing to tablename_old.

Now without altering table_2 you cannt point it back to tablename. You have to keep on going making alters in every child and referenced table.

A.O.
  • 3,733
  • 6
  • 30
  • 49
Lokesh
  • 1