13

I'm working with data from an old mysql database. There's a table in this database with a string column that has its encoding set to "cp1252 West European (latin1)" (same as Windows-1252). When querying the data from mysql command prompt, data from this field is represented as:

Obama’s

This is supposed to read

Obama’s

I've tried following the accepted answer for How to convert an entire MySQL database characterset and collation to UTF-8? to convert the field to UTF-8 in MySQL, but it makes no difference.

I also tried inserting a new row into that table, using Obama’s as the text for that field (again, from the mysql command prompt). However, this text is correctly represented when I then query the same row I just inserted. I tried performing that insertion both when the field was set to latin1 and when it was set to UTF-8. Same result.

This leads me to believe that when the bad data was inserted into the database, it was first incorrectly encoded by PHP. This is where it gets fuzzy to me.

I can assume that the data was inserted via a web form and processed with PHP. What did PHP do with it before inserting it into the database? Did it convert the string to UTF-8, which according to the table on this helpful page, uses the three bytes %E2 %80 %99 to represent the character. Do I have that right?

If that's correct, what are my options to repair this data? I'd like to convert the table and its fields to UTF-8 encodings, but that doesn't seem to fix the text. Do I have to write a script that manually changes those characters to what they should be?

Community
  • 1
  • 1
Brian
  • 7,204
  • 12
  • 51
  • 84
  • The problem that caused this is described here: [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/). The connection encoding was very likely set to `latin1` while UTF-8 encoded data was actually being send. To fix this you have to convince MySQL to take the *bytes* of the current data in `latin1` and then insert it as *`utf8` bytes*. There's an SQL one-liner `UPDATE` command for that, can't find it right now though. – deceze Nov 22 '13 at 18:24
  • dear Brian, please, accept the only answer, it saved my life – vladkras Feb 15 '17 at 05:50

1 Answers1

36
select convert(binary convert(field_name using latin1) using utf8) from table_name

If this displays correctly you can do update.

Imre L
  • 6,159
  • 24
  • 32