It would be even better to move to the utf8mb4 collations instead of utf8.
Yes, it is probably wise to move almost everything to utf8mb4.
If you have MySQL 5.5 or 5.6, there could be issues with utf8mb4. However, there are workarounds.
There are a few ways of converting tables, but only one way will be correct. And which way is "correct" depends on whether the data in the table is messed up. SELECT HEX(col) ...
will give FCDCEF80E1E0E9C9
for üÜï€áàéÉ
if the CHARACTER SET
(that is, the "encoding") is latin1. For utf8 or utf8mb4, the hex will be C3BCC39CC3AFE282ACC3A1C3A0C3A9C389
.
See SHOW VARIABLES LIKE 'chara%';
-- some of those should not be changed. The ones to focus on are the three set by SET NAMES utf8mb4;
. (Do the SHOW
again, to see which ones changed. Then another SET
to get them back.)
As for which COLLATION
to go with, ..._general_ci
is the least interesting. However, until you get into the details of comparisons or ordering for various languages, the collation won't matter much.
The charset and collation of a database are the defaults for newly created tables. Instead of worrying about that, get in the habit of being explicit on CREATE TABLE
. The charset and collation of a table are the defaults for newly columns in the table.
Then there is the charset and collation on the connection. This is important because it declares the encoding (charset) of the byte in the client. The collation of the connection may affect the question you posed. (Sorry, but there was a lot of groundwork to lay before getting to your Question.) So make the collation consistent with your databases and tables.