All the MySQL tables in my PHP web application are MyISAM with utf8 encoding. Since records can be generated from a companion app while it's offline, my table keys are randomly generated, alphanumeric VARCHARs; these fields are set to binary with utf8_bin encoding so they can be case-sensitive.
I recently decided to change the encoding of all my text fields, to support emojis that some users like to enter. I went ahead and changed all utf8 fields to utf8mb4, including the keys. I immediately started seeing performance issues, where complex SELECT queries on one of the larger tables took more than a minute, and then other queries queued up waiting for table locks. I changed the encoding of the primary key field on that table back to utf8, and performance returned to normal. A couple days later, I changed that one field to utf8mb4 again, the queries started queueing up again, and I changed it back to restore the normal performance.
So this runs smoothly:
`ID` varchar(8) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT ''
But this causes problems:
`ID` varchar(8) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT ''
Everything I've read says utf8 and utf8mb4 should have equivalent performance, but I'm seeing a distinct difference in my case. Does this make sense?
It's not really a problem to keep the key fields at utf8, since I don't foresee ever using more than simple alphanumeric characters there. But I would have liked to have all the fields set to the same encoding just for consistency and simplicity of maintenance (don't have to remember to set user-populated fields to one encoding and key fields to another encoding).
Regarding the comment from @MandyShaw
When I work with the database with the Sequel Pro Mac app, the console constantly shows pairs of SET NAMES 'utf8'
and SET NAMES 'utf8mb4'
entries, so that does suggest not everything is set correctly. However, here's what I have currently:
MySQL [(none)]> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
I read that character_set_system
can't be changed from utf8 and character_set_filesystem
should be binary.
Sequel Pro's connection encoding was set to Autodetect, but when I change it explicitly to utf8mb4, then open a new connection, I still see all those encoding changes in the console.
Is there something else I need to change to use this encoding consistently?