0

A month ago, we migrated our database from DB2 to MySQL. The original data was in utf8, but when it got imported into MySQL, the default encoding was latin1. We only realized this after discovering funny characters in the data, and some research. We would like to change the encoding for MySQL to utf8, and have some helpful instructions on this, but my concern is that in the past month, if there were any changes to the data it would have saved in latin1.

The data is entered via web pages set in utf8, but since the database is currently in latin1, would the changes be saved in latin1? If so, how could we figure out which data is really in latin1 and not in the original utf8?

Secondly, if there were some data in latin1 and some in the original utf8 on the same table column, how would we handle the conversion since some data require conversion and some don't?

Thanks for your help!

clarinet
  • 113
  • 1
  • 9
  • You may have to create a migration script - application - that read data as your application and save it in a new MySQL database with the same schema. – SaidbakR Aug 24 '16 at 20:04
  • First, let's double check which situation you have. Please provide `SHOW CREATE TABLE` and perform the `SELECT HEX` discussed in http://stackoverflow.com/a/38363567/1766831 – Rick James Aug 24 '16 at 20:32
  • @Rick James, the SHOW CREATE TABLE shows CHARSET=latin1 and the SELECT HEX on a column that has a special character shows "≥" in place of what should be "≥". This is just one example. This happens because the original data was in utf8, but got imported into the database in latin1. I'm concerned if new data was saved through my web page with some latin1 characters, and then I did the conversion of the database to utf8, would all latin1 characters be able to convert correctly? I would use the MySQL CHANGE method to binary type and then back to the original type with SET CHARACTER UTF8. – clarinet Aug 25 '16 at 18:01
  • The HEX for `≥` is `E289A5` if it correctly stored as utf8. `≥` is the Mojibake for it. `C3A2E280B0C2A5` is the hex if it were "double encoded". Which _HEX_ do you get? I need to know which case you have before proceeding. If you do proceed through `BINARY`, be sure you end up with `E289A5`. – Rick James Aug 26 '16 at 19:53
  • Sorry for the delay. Yes, I get E289A5 for ≥. – clarinet Sep 01 '16 at 15:14

0 Answers0