8

Im trying to convert my UTF8 MySQL 5.5.30 database to UTF8MB4. I have looked at this article https://mathiasbynens.be/notes/mysql-utf8mb4 but have some questions.

I have done these

ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

The last one was manually done with 62 tables, one of them gave me this warning

13:08:30 ALTER TABLE bradspelold.games CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci 101289 row(s) affected, 2 warning(s): 1071 Specified key was too long; max key length is 767 bytes 1071 Specified key was too long; max key length is 767 bytes Records: 101289 Duplicates: 0 Warnings: 2 3.016 sec

  1. Is this a problem? What could I do to fix it?

The next step is

ALTER TABLE table_name CHANGE column_name column_name
         VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
  1. Im not sure about the command, why is there 2 column_name?
  2. Should I do this only on the VARCHAR(191) columns? I dont think I have any of them?
  3. Do you know of any more artickels like this that explains more id detail why and how?

Edit :

Table Show games

CREATE  TABLE `games` (
        `id` int(10) unsigned NOT NULL DEFAULT \'0\',
        `name` varchar(255) NOT NULL,
        `description` mediumtext,
        `yearPublished` datetime NOT NULL,
        `minPlayers` int(10) unsigned NOT NULL,
        `maxPlayers` int(10) unsigned NOT NULL,
        `playingTime` varchar(127) NOT NULL,
        `grade` double NOT NULL DEFAULT \'0\',
        `updated` datetime NOT NULL,
        `forumParentId` int(10) unsigned DEFAULT \'0\',
        `lastVisited` datetime DEFAULT NULL,
        `inactivatedDate` datetime DEFAULT NULL,
        `bggGrade` double DEFAULT NULL,
        PRIMARY KEY (`id`),
        KEY `inactivatedDate` (`inactivatedDate`),
        KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8'

Edit 2:

    'CREATE TABLE `forum_threads` (
      `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `title` varchar(150) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `description` varchar(150) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `createdDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `createrId` int(10) unsigned DEFAULT NULL,
      `replys` int(10) unsigned NOT NULL DEFAULT ''0'',
      `lastPostUserId` int(10) unsigned DEFAULT NULL,
      `lastPostId` int(10) unsigned DEFAULT NULL,
      `forumId` int(10) unsigned DEFAULT NULL,
      `visits` int(10) unsigned NOT NULL DEFAULT ''0'',
      `lastPostCreated` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `lastPostNickName` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `createrNickName` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
      `solved` tinyint(1) NOT NULL DEFAULT ''0'',
      `locked` tinyint(1) NOT NULL DEFAULT ''0'',
      `lockedByUserId` int(10) unsigned NOT NULL DEFAULT ''0'',
      `lockedDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `alteredDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
      `alteredUserId` int(10) unsigned DEFAULT NULL,
      `glued` tinyint(1) NOT NULL DEFAULT ''0'',
      `pollId` int(10) unsigned DEFAULT NULL,
      `facebookPostId` bigint(20) DEFAULT NULL,
      `facebookImportedDate` datetime DEFAULT NULL,
      PRIMARY KEY (`id`),
      KEY `FK_forum_threads_1` (`forumId`),
      KEY `FK_forum_threads_2` (`pollId`),
      KEY `createdDate` (`createdDate`),
      KEY `createrId` (`createrId`),
      KEY `lastPostCreated` (`lastPostCreated`),
      CONSTRAINT `FK_forum_threads_1` FOREIGN KEY (`forumId`) REFERENCES `forum` (`id`) ON DELETE CASCADE
    ) ENGINE=InnoDB AUTO_INCREMENT=4306 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci'

'CREATE TABLE `forum` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(80) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `description` varchar(150) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `createdDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
  `threads` int(10) unsigned NOT NULL DEFAULT ''0'',
  `createrId` int(10) unsigned DEFAULT NULL,
  `lastPostUserId` int(10) unsigned DEFAULT NULL,
  `lastThreadId` int(10) unsigned DEFAULT NULL,
  `parentForumId` int(10) unsigned DEFAULT NULL,
  `lastPostNickName` varchar(30) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `lastPostCreated` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
  `lastThreadTitle` varchar(80) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '''',
  `alteredDate` datetime NOT NULL DEFAULT ''0000-00-00 00:00:00'',
  `alteredUserId` int(10) unsigned DEFAULT NULL,
  `placeOrder` int(10) unsigned NOT NULL DEFAULT ''0'',
  `separator` tinyint(1) NOT NULL DEFAULT ''0'',
  `rightLevel` int(10) unsigned NOT NULL DEFAULT ''1'',
  `createChildForum` tinyint(3) unsigned NOT NULL DEFAULT ''1'',
  `createThreads` tinyint(3) unsigned NOT NULL DEFAULT ''1'',
  PRIMARY KEY (`id`),
  KEY `Index_1` (`id`,`parentForumId`)
) ENGINE=InnoDB AUTO_INCREMENT=375 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci'
Rick James
  • 135,179
  • 13
  • 127
  • 222
Banshee
  • 15,376
  • 38
  • 128
  • 219

3 Answers3

5
  1. There are limits on the size of an INDEX. You bumped into the limit because utf8mb4 needs up to 4 bytes per character, where as utf8 needs only 3. Meanwhile the INDEX size limit is in bytes.

The 'solution' is to decide what to do about the over-sized index. (more below)

2.

ALTER TABLE t CHANGE col col ...

is the same as the more logical

ALTER TABLE t MODIFY col ...

The former allows you to change the name of the column, hence two copies of the column name when you don't need to change the name.

  1. Quite likely you had VARCHAR(255) which takes 767 bytes in utf8 (3*255+2; the "2" is the size of the length field). The equivalent in the 4-byte utf8mb4 would be (191) (4*191+2=766; not room for more than 191).

  2. I have not seen an article about it. I suspect that what I just said is most of what needs to be said.

So...

Plan A: Do you have foo VARCHAR(255) and it was utf8? Is the data in it always (now and in the future) shorter than 191 characters? If so, then simply do the ALTER.

Plan B: If you need more than 191, do you really need the INDEX? DROP INDEX may be an alternative.

Plan C: Or, you could use a "prefix" index: INDEX(foo(191)), while leaving it VARCHAR(255). Usually "prefix" indexes are useless, but you might have a use case for which it works.

To discuss this further, please provide SHOW CREATE TABLE for the table in question, and discuss the meaning of that particular field and its INDEX.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks! You talk about INDEX but all my Index are of simple int? VarChars are only used for titles, username and so on. – Banshee Mar 23 '15 at 09:01
  • 1
    Hmmmm, that's strange. Let's see `SHOW CREATE TABLE bradspelold.games`. – Rick James Mar 23 '15 at 17:07
  • It's complaining about `KEY name (name)`, which is `VARCHAR(255)` – Rick James Mar 24 '15 at 17:25
  • And that should be 191 instead? I dont really understand how smaller can be better? What if I need bigger? – Banshee Mar 24 '15 at 19:04
  • 1
    `SELECT MAX(CHAR_LENGTH(name)) FROM t` -- See how long you need (so far). I suspect it is much below 191. If, on the other hand, you get 255, then you have already truncated some `name`s. If it seems you do need more than 191, declare it as big as needed, then change the index to `INDEX(name(191))`. This is a "prefix index"; it may be better than nothing. – Rick James Mar 24 '15 at 20:55
  • Thanks, I will check this. Usually 1 byte = 1 char, so when the VarChar says 255 I thought that it was 255 chars. With UTF8mb4 it seems like every char is bigger? I have read something about that in UFT8 it is 3 so I can feet 85 chars in to a 255 field? and in UFT8mb4 I can feet only 63 chars? I dont really get why I need to make it smaller? It makes no sense? I it demanded bigger VarChar I would have understood? – Banshee Mar 25 '15 at 12:29
  • 1
    With utf8mb4, each _character_ is between 1 and 4 _bytes_. 1 byte for English / ascii, 2 bytes for European accented letters, 3 bytes for most of Asia, and 4 bytes for some of Chinese. The max (4) is used for limiting INDEX usage. utf8 is a subset of utf8mb4. utf8 is missing the 4-byte chars. 'abcde' occupies only 5 bytes. – Rick James Mar 25 '15 at 16:29
  • 1
    I agree that they should have tried to raise the index limit when they added utf8mb4. But they did not. – Rick James Mar 25 '15 at 16:30
  • Sorry for the delay, I have checked and the MAX charlength of the name field is 248 so I neeed to make it bigger. Is there any problems to just set it to 512? – Banshee Mar 31 '15 at 08:38
  • 1
    Anything between 192 and 64K is virtually the same. So, 512 would be fine. Did you look into Plan B (getting rid of the index)? Or did you find a query that seems to need that index? You could show us the SELECT, or you could simple go with Plan C (prefix index) and hope that it works well enough. – Rick James Mar 31 '15 at 17:00
  • Thanks, that solved the first problem but then I run in to the same problem but involving another table. How did you know that it was the column name in games that was wrong? I do have a couple of VarChars in the this other tables as well but thay are of size 150 and 30. See edit 2. – Banshee Apr 04 '15 at 20:19
  • 1
    I looked at all then INDEXes (including UNIQUE and PRIMARY); then matched up the column declarations. Any VARCHAR(...) between 192 and 255 was in trouble. Namely games.name, but nothing else in those 3 CREATEs. 150 and 30 is smaller <= 191, so won't exceed 767. 2+4*150 is only 602. – Rick James Apr 05 '15 at 01:29
  • 1
    The error message "13:08:30 ALTER TABLE **bradspelold.games** ..." told me to look at `games`. – Rick James Apr 05 '15 at 01:30
4
DB="database_name"
USER="mysql_user"
PASS="mysql_password"
(
    echo 'ALTER DATABASE `'"$DB"'` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'
    mysql -p$PASS -u $USER "$DB" -e "SHOW TABLES" --batch --skip-column-names \
    | xargs -I{} echo 'ALTER TABLE `'{}'` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;'
) \
| mysql -p$PASS -u $USER "$DB"
  • To get the script you work open your command line and use the following steps:

    1. nano convert_to_utf8mb4.sh
    2. paste the script & save
    3. sudo chmod 755 convert_to_utf8mb4.sh (in terminal)
    4. run the script by type ./convert_to_utf8mb4.sh

    Yes, collation has been changed!

Mahbub Tito
  • 1,057
  • 8
  • 7
2

this is an old question but following some of the answers here 5 years later is, as I discovered, a bad idea. Do not alter the size of your VARCHAR fields, you could damage your data and break everything.

In current versions of MySQL and MaraiDB add this to your config and it will support the larger keys necessary for UTF8mb4

innodb_large_prefix=1

I also suggest adding innodb_file_per_table =1 innodb_file_format=Barracuda

then the conversion will happen without errors/warnings about the key length

J_D
  • 740
  • 8
  • 17
EricG
  • 21
  • 1