1

I'm trying to upgrade the character encoding from latin1 to utfmb8 in a MySQL database (part of a Rails 3 app). I've done some reading but I'm stuck on my last two tables: users and trigrams.

Trying to update the users table gets the dreaded ERROR 1071:

mysql> ALTER TABLE users CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

Following this answer, I can work around that by dropping and recreating the index on the problematic VARCHAR(255) column:

mysql> DROP INDEX index_users_on_remember_token ON users;
mysql> CREATE INDEX index_users_on_remember_token ON users (remember_token(191));

I'm worried that'll screw things up down the line, since I don't really understand indexes. But putting that aside for the moment...

My main problem is how to do something similar for my last table: trigrams. Here's some info on the trigrams table:

mysql> SHOW CREATE TABLE trigrams;
CREATE TABLE `trigrams` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `trigram` varchar(3) DEFAULT NULL,
    `score` smallint(6) DEFAULT NULL,
    `owner_id` int(11) DEFAULT NULL,
    `owner_type` varchar(255) DEFAULT NULL,
    `fuzzy_field` varchar(255) DEFAULT NULL,
    PRIMARY KEY (`id`),
    KEY `index_for_match` (`owner_id`,`owner_type`,`fuzzy_field`,`trigram`,`score`),
    KEY `index_by_owner` (`owner_id`,`owner_type`)
    ) ENGINE=InnoDB AUTO_INCREMENT=41531 DEFAULT CHARSET=latin1

mysql> SHOW INDEXES IN trigrams;
+----------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name        | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| trigrams |          0 | PRIMARY         |            1 | id          | A         |       35502 |     NULL | NULL   |      | BTREE      |         |               |
| trigrams |          1 | index_for_match |            1 | owner_id    | A         |       35502 |     NULL | NULL   | YES  | BTREE      |         |               |
| trigrams |          1 | index_for_match |            2 | owner_type  | A         |       35502 |     NULL | NULL   | YES  | BTREE      |         |               |
| trigrams |          1 | index_for_match |            3 | fuzzy_field | A         |       35502 |     NULL | NULL   | YES  | BTREE      |         |               |
| trigrams |          1 | index_for_match |            4 | trigram     | A         |       35502 |     NULL | NULL   | YES  | BTREE      |         |               |
| trigrams |          1 | index_for_match |            5 | score       | A         |       35502 |     NULL | NULL   | YES  | BTREE      |         |               |
| trigrams |          1 | index_by_owner  |            1 | owner_id    | A         |       35502 |     NULL | NULL   | YES  | BTREE      |         |               |
| trigrams |          1 | index_by_owner  |            2 | owner_type  | A         |       35502 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------+------------+-----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

My question is: how do I get around the following error?

mysql> ALTER TABLE trigrams CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes

What changes do I make to trigrams? And what are the risks of making those changes?

(If it's relevant: the trigrams table was automatically created by some third-party code, the fuzzily gem.)

Community
  • 1
  • 1
Jonathan
  • 1,075
  • 2
  • 10
  • 18

1 Answers1

0

Try

CREATE TABLE tmp LIKE triggrams;
SET GLOBAL innodb_large_prefix=1;
ALTER TABLE tmp CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
LOCK TABLE trigrams, tmp WRITE;
INSERT INTO tmp SELECT * FROM trigrams;
RENAME TABLE trigrams to trigrams_old, tmp to trigrams;
UNLOCK TABLES;

The second option is to reduce the indexed prefix of varchar columns.

Kostja
  • 1,607
  • 10
  • 17
  • Second step fails: mysql> ALTER TABLE tmp CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes – Jonathan Apr 13 '15 at 18:25
  • Ugh, OK. But the basic idea is that with a tmp table aside, you could play around with table definition without affecting your original table or data in it. Why not use your own CREATE TABLE statement from the question, but replace default charset=latin1 with default charset=utf8mb4? – Kostja Apr 13 '15 at 18:29
  • Right, ok: I created a `tmp` table based on the `CREATE TABLE` statement for `trigrams`, but with `VARCHAR(191)` in place of `VARCHAR(255)` in both places, and with `utf8mb4` in place of `latin1`. Seems to be working, thanks for the suggestion. Strangely, just replacing the indexes on the existing `trigrams` table with indexes of length `191` doesn't seem to work like it did for the `users` table. Not sure why. – Jonathan Apr 13 '15 at 18:55
  • index_for_match index covers two varchar(255) fields, which gives you in total 510 * 4 = up to 2040 bytes of index key. By default the limit of index key is AFAIR 1000 bytes, you can change it with innodb_large_prefix global variable to 3000 bytes. – Kostja Apr 13 '15 at 18:56
  • This is essentially a [duplicate](http://dba.stackexchange.com/questions/96622/gotchas-converting-latin1-swedish-ci-to-utf8mb4-unicode-ci), which presents 3 answers. – Rick James Apr 16 '15 at 06:40
  • Nothing about innodb_large_prefix in the duplicate question. – Kostja Apr 16 '15 at 06:45
  • Get 5.5.14 or 5.6.3 (or later). `SET GLOBAL innodb_file_format=Barracuda;` `SET GLOBAL innodb_file_per_table=ON;` `SET GLOBAL innodb_large_prefix=ON;` `ALTER TABLE tbl ROW_FORMAT=DYNAMIC;` _Then_ maybe the ALTER will work. – Rick James Apr 20 '15 at 00:13