I have a lot of data in mysql in latin1 format, and I'm trying to convert everything to utf8, but so far I didn't find the right collation to convert it correctly. Some of the data are international names, with a lot of accent, and characters from various language and a primary key on these fields. I have a simple test case that goes like this:
CREATE TABLE utf8_test ( value varchar(30), PRIMARY KEY(value) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci;
INSERT INTO utf8_test VALUES ('e');
INSERT INTO utf8_test VALUES ('é');
INSERT INTO utf8_test VALUES ('è');
INSERT INTO utf8_test VALUES ('ü');
INSERT INTO utf8_test VALUES ('u');
INSERT INTO utf8_test VALUES ('y');
INSERT INTO utf8_test VALUES ('ÿ');
It's a simple test to demonstrate the problem, but real data is not limited to this.
So far, only utf8_bin can accept everything without duplicate error, but I can't use it, because it's case-sensitive. Am I missing something ?
note: some of the tables have more than a dozen millions rows, so performance is a factor.