3

I have a utf8_general_ci database that I'm interested in converting to utf8_unicode_ci. I've tried the following commands

    ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci; (for every single table)

But that seems to change the charset for future data but doesn't convert the actual existing data from utf8_general_ci to utf8_unicode_ci. Is there any way to convert the existing data to utf8_unicode_ci?

Khaleal
  • 876
  • 3
  • 10
  • 15
  • I think this should help - https://stackoverflow.com/questions/18445969/how-to-change-collation-of-all-rows-from-latin1-swedish-ci-to-utf8-unicode-ci – Dhanashree Oct 24 '17 at 10:02

3 Answers3

2

SHOW CREATE TABLE to see if it really set the CHARACTER SET and COLLATION on the columns, not just the defaults.

What was the CHARACTER SET before the ALTERs?

Do SELECT col, HEX(col) ... for some field that should have utf8 in it. This will help us determine if you really have utf8 in the table. The encoding for characters is different based on CHARACTER SET; the HEX helps discover such.

The ordering (WHERE, ORDER BY, etc) is controlled by COLLATION. The indexes probably had to be rebuilt based on your ALTER TABLE. Did big tables with indexes take a 'long' time to convert?

To actually see the difference between utf8_general_ci and utf8_unicode_ci, you need a "combining accent" or, more simply, the German ß versus ss:

mysql> SELECT 'ß' = 'ss' COLLATE utf8_general_ci,
              'ß' = 'ss' COLLATE utf8_unicode_ci;
+-------------------------------------+-------------------------------------+
| 'ß' = 'ss' COLLATE utf8_general_ci  | 'ß' = 'ss' COLLATE utf8_unicode_ci  |
+-------------------------------------+-------------------------------------+
|                                   0 |                                   1 |
+-------------------------------------+-------------------------------------+

However, to test that in your tables, you would need to store those values and use WHERE or GROUP_CONCAT or something else to determine the equality.

What 'proof' do you have that the ALTERs failed to achieve the collation change?

(Addressing other comments: REPAIR should be irrelevant. CONVERT TO tells the ALTER to actually modify the data, so it should have done the desired action.)

Rick James
  • 135,179
  • 13
  • 127
  • 222
1

You have to change the collation of every field in every table. As you say, the collation of the table is only the default value for fields created later, and the collation of the database is only the default value for tables created later.

Lorenz Meyer
  • 19,166
  • 22
  • 75
  • 121
0

As Lorenz Meyer said, the collation of the table is only the default value for fields created later and you need to set the defaults for the columns explicitly too.

Such a change looks like:

ALTER TABLE mytable CHANGE mycolumn mycolumn varchar(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci
user2587656
  • 309
  • 3
  • 5