0

I need to convert an existing database from Latin1 to UTF8. Looking here I think I see all that I need to do. However, other links suggest you have to actually export the data, then re-import in order to preserve data. Which is actually required to prevent any data loss? If there is no way to really prevent any data loss (loss being different encoding for characters that do not map between the two), what is the best approach to catch all instances of this? Will I simply be able to do a diff, or is there a faster/better way?

Community
  • 1
  • 1
WildBill
  • 9,143
  • 15
  • 63
  • 87

1 Answers1

0

I believe that all 256 latin1 encodings have a mapping in utf8. (But certainly not the other direction.) So, you won't lose data if you have suitable code. I recommend you test the conversion before risking the conversion.

But first, we need to make sure you are doing the right conversion. You have a latin1 column that contains one-byte accented letters in latin1? Find a cell with an accent, SELECT LENGTH(col), CHAR_LENGTH(col) ... If the lengths are the same, then it is a one-byte encoding, such as latin1.

SELECT HEX(col) ... -- e-acute (é) would be E9 in latin1 or C3A9 in utf8. This is important -- If you have E9, then you need to convert all the text from latin1 to utf8 and change the declaration of the column. If you already have C3A9, you already have utf8 masquerading as latin1; you need to change the declaration without modifying the bytes.

First case: ALTER TABLE tbl CONVERT TO CHARACTER SET utf8; -- This actively changes the necessary bytes in the columns.

Second case: alter table t modify c varbinary(...); alter table t modify c varchar(...) charset utf8;

More discussion of that and other cases: mysql.rjweb.org/doc.php/charcoll

Rick James
  • 135,179
  • 13
  • 127
  • 222