I'm trying to convert a database to use utf8mb4 instead of utf8. Everything is going fine except one table:
CREATE TABLE `search_terms` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`search_term` varchar(128) NOT NULL,
`time_added` timestamp NULL DEFAULT NULL,
`count` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `search_term` (`search_term`),
KEY `search_term_count` (`count`)
) ENGINE=InnoDB AUTO_INCREMENT=198981 DEFAULT CHARSET=utf8;
Basically all it does is save an entry every time somebody searches something in a form so we can track the number of searches, very simple.
There's a unique index on search_term
because we want to only have one row per search term and instead increment the count value.
However when converting to utf8mb4 I am getting duplicate entry errors. Here is the command I am running:
ALTER TABLE `search_terms` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Looking in the database I can see various examples like this:
fm2012
fm2012
fm2012
In it's current utf8 character set, these are all being treated as unique and exist within the database without ever having an issue with the unique index on search_term
.
But when converting to utf8mb4 they are now being considered equal and throwing an error due to that index.
I can figure out how to merge these together easily enough, but i'm concerned this may be a symptom of a greater underlying problem. I'm not really sure how this has happened or what the consequences may be, so my questions are a bit vague:
- Why is utf8mb4 treating these differently to utf8?
- What are the possible consequences?
- Is there someway I can do a conversion so things like "fm2012" never appear in my database and I only have "fm2012" (I am also using Laravel 5.1)