My company faced an unpleasant issue when we upgraded our MySQL DB hosted in AWS RDS from version 5.7 to 8.0.
Initially, we had MySQL 5.7, where the previous dev team left the charsets and collations default. Charset was latin1
, and the collation was latin1_swedish_ci
. This database is used by PHP and Node.JS apps to store the data without any encoding transformations. Both worked fine, storing the data containing non-Latin characters, such as currency symbols and non-Latin languages.
The problems began when we upgraded the version of the MySQL engine to 8.0 using AWS blue/green update. The data remains the same, but all non-Latin characters start to display as question marks (????) or gibberish. Changing the charset to utf8mb4
and collation to utf8mb4_900_ai_ci
doesn't help. The data seems unchanged and returned gibberish even when issuing the command SET NAMES utf8
or SET NAMES utf8mb4
.
Worse than that, after converting tables to utf8
, the data that comes to clients' DB seems corrupted (question marks and gibberish again). And previously saved data also become unreadable.
The only thing that seems to help is issuing the command SET NAMES latin1
. After that, we get most of the data, but some chars, like the euro symbol (€) and other chars from different languages, don't store correctly in the DB.
The strange thing is that €-was stored in MySQL 5.7 before without problems and the SET NAMES ...
command.
I read similar topics describing how to convert table charset and collation, but the problem is that we can convert tables using SQL statements, but the data remains unconverted. So it returns gibberish even after conversion.
Also, I want to mention that during our attempts to convert the data using MySQL CONVERT TO CHARACTER SET
commands, we got the situation that now part of the data comes to the DB using utf8 and then can't be converted back to latin1
(to perform conversion via BLOB fields). It now returns an Illegal character \xXX
error. Does anybody know how to convert those values not bringing the DB back to latin1
?
Update:
This is the result of SELECT id, format, HEX(format) FROM stores
:
The table already have charset
utf8mb4
(we run ALTER TABLE stores CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
). But as you can see the data isn't converted. All rows should have the €{{amount_with_comma_separator}}
value, but only one has it correctly.
The problem is that after conversion of the table to utf8mb4
, the data retuned the same regardless of the SET NAMES [utf8|latin1]
I will appreciate any help.