1

I have a MySQL database, full of data, with collation = latin1 - default collation.

If I change that collation to utf8 - default collation, a superset of the collation above,
would there be any corruption in the existing data?

Mr.President
  • 163
  • 1
  • 9

1 Answers1

2

Your data will not be corrupted, but you must take care to update the collation of all tables and columns and not just change the default collation of the database as this would only apply to new tables. If you do not alter all existing tables you might encounter some odd behaviour when running queries where data are compared.

I would recommend that you run the following queries as taken from David Whittaker's answer

Heres how to change all databases/tables/columns. Run these queries and they will output all of the subsequent queries necessary to convert your entire schema to utf8. Hope this helps!

-- Change DATABASE Default Collation

SELECT DISTINCT concat('ALTER DATABASE `', TABLE_SCHEMA, '` CHARACTER SET utf8 COLLATE utf8_unicode_ci;')
from information_schema.tables
where TABLE_SCHEMA like  'database_name';

-- Change TABLE Collation / Char Set

SELECT concat('ALTER TABLE `', TABLE_SCHEMA, '`.`', table_name, '` CHARACTER SET utf8 COLLATE utf8_unicode_ci;')
from information_schema.tables
where TABLE_SCHEMA like 'database_name';

-- Change COLUMN Collation / Char Set

SELECT concat('ALTER TABLE `', t1.TABLE_SCHEMA, '`.`', t1.table_name, '` MODIFY `', t1.column_name, '` ', t1.data_type , '(' , t1.CHARACTER_MAXIMUM_LENGTH , ')' , ' CHARACTER SET utf8 COLLATE utf8_unicode_ci;')
from information_schema.columns t1
where t1.TABLE_SCHEMA like 'database_name' and t1.COLLATION_NAME = 'old_charset_name';
Ruan
  • 334
  • 1
  • 10
  • Oh, very nice, thank you! I thought I would just change the database's default collation, but, sadly enough I'll have to do it on tables/columns too. – Mr.President Dec 18 '19 at 12:41
  • 1
    Actually, only the columns matter. The `DATABASE` setting is only a _default_ for new tables. The `TABLE` setting is only a _default_ for new columns. – Rick James Dec 18 '19 at 19:43
  • 2
    utf8mb4 would be better than utf8. – Rick James Dec 18 '19 at 19:48
  • @RickJames Thanks a lot, sir. I deal with ASCII only, so I went with the default, but, now I need to store Argon2 encoded hash. I was thinking to change only that table which stores hash ( to utf8mb4) and leave other tables. Would it be a good design? – Mr.President Dec 19 '19 at 04:06
  • 1
    @Mr.President - All hashes that I know of are either ascii or binary; utf8mb4 is overkill, and conceivably be 'wrong'. Any binary string can be easily converted to ascii (hex) or base64, at a cost of taking more bytes. – Rick James Dec 19 '19 at 05:04
  • @RickJames Ok, Sir, I'll check on my side. When I checked the returned hash from Argon2, it showed as ``. So, I thought may be I need to store it differently. Also, am reading your blog, very informative, many thanks. – Mr.President Dec 19 '19 at 05:18
  • 1
    @Mr.President - (I am not familiar with Argon2.) I would probably try to convert to `BINARY(n)` if fixed length, `VARBINARY(n)` if variable, or `BLOB` if potentially quite long. Those are smaller than the equivalent hex (by 1:2) or base64 (6:8). – Rick James Dec 19 '19 at 05:41