0

After upgrading MySQL 5.7 -> 8.0, we realized we needed to convert all tables from latin1 to utf8. After a long googling, I found that the best solution for that is to convert the columns with text datatypes first to VARBINARY (or BLOB for texts) and then back to VARCHAR (text) with the correct CHARACTER SET.

Running the command:

ALTER TABLE [table] MODIFY COLUMN [column] VARBINARY([size]);

runs fine and the column got changed to the VARBINARY type.

But the conversion back:

ALTER TABLE [table] MODIFY COLUMN [column] VARCHAR([size]) CHARACTER SET utf8mb4;

returns an error:

Incorrect string value: '\xF3nica' for column '[column]' at row [index]

The value is random across tables and columns, but every time it happens if the field has some non-ANSI character (like accents, non-English letters, etc.). This particular error happens with the first name column containing Mónica (xF3 = ó). Why it incorrect for UTF8mb4?

I'm also frustrated about this error because when we used to run MySQL 5.7 with the default character set latin1, we had no errors with non-Latin characters. Everything was working fine. The problem started after the MySQL version upgrade.

Stu
  • 30,392
  • 6
  • 14
  • 33
  • Well, the UTF-8 encoding for `ó` is `00F3` rather than `F3`. If your binary data is Windows-1252, nothing in the procedure you've shared allows MySQL to know that. Modern MySQL versions default to UTF-8, so you'll have to set `latin1` somewhere. – Álvaro González Jun 12 '23 at 08:34
  • BEWARE: There are several ways to convert a table; using the wrong one makes things worse. See https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored and 5 diff conversions: https://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases – Rick James Jun 26 '23 at 00:20

1 Answers1

1

The default character set in 5.7 was latin1, but is utf8mb4 in 8.0. If you explicitly set the character set for your connection to latin1, it would function exactly as it did prior to the upgrade, as there would be no implicit conversion due to the differing character sets between connection and column.

To get the column changed to utf8mb4, try converting the column content before changing the column data type:

ALTER TABLE [table] MODIFY COLUMN [column] VARBINARY([size]);
UPDATE [table] SET [column] = CONVERT([column] USING utf8mb4);
ALTER TABLE [table] MODIFY COLUMN [column] VARCHAR([size]) CHARACTER SET utf8mb4;

You should be able to skip the VARBINARY step:

UPDATE [table] SET [column] = CONVERT(CAST([column] AS BINARY) USING utf8mb4);
ALTER TABLE [table] MODIFY COLUMN [column] VARCHAR(50) CHARACTER SET utf8mb4;
user1191247
  • 10,808
  • 2
  • 22
  • 32
  • 1
    Worked, thanks a lot. I'm unable to skip conversion to VARBINARY because when updating the column it still returns `Incorrect string...` but it's ok. – Vitaly Radchik Jun 12 '23 at 10:15