recreate encoding mixup in mysql table

Question

I have an sql table where a column has utf8_unicode_ci encoding, but the table itself has latin1_swedish_ci encoding (as reported under Row Statistics in Structure tab of phpMyAdmin).

The PHP webapp that accesses the database displays Japanese text correctly, but inside phpMyAdmin everything is mojibake. The webapp (correctly) displays the Japanese text Xで有名な, but in phpMyAdmin it is Xã¦ã‚™æœ‰åãª (hex() output is 312E2058C3A3C281C2A6C3A3E2809AE284A2C3A6C593E280B0C3A5C290C28DC3A3C281C2AA).

The app that was used to generate the data in the table is now broken, but I need to add a few new records. How can I recreate the mojibake found in the table?

I tried to reproduce the mojibake with python:

def rev_engineer(utf8):
    mojibake = utf8.encode('utf8').decode('latin1')
    print(mojibake)

rev_engineer('Xで有名な')
# output:    Xã¦ãæåãª
# should be: Xã¦ã‚™æœ‰åãª

This is obviously very similar, but not quite there. I then tried looping through every possible encoding listed in python's documentation, and encoding/decoding every possible combination, but that did not come up with a match, either. Any idea what I'm missing?

can you run **select hex(column_name) from your table where id=<...>;** — EchoMike444, Aug 04 '18 at 00:02
@EchoMike444 312E2058C3A3C281C2A6C3A3E2809AE284A2C3A6C593E280B0C3A5C290C28DC3A3C281C2AA — reynoldsnlp, Aug 06 '18 at 16:48

EchoMike444 · Accepted Answer · 2018-08-07T15:24:01.987

To be sure my character will be interpreted as UTF8 sequence

test> set names utf8 ;
Query OK, 0 rows affected (0.00 sec)

check that i have 2 bytes for é

test> select hex(binary('é')) ;
+-------------------+
| hex(binary('é')) |
+-------------------+
| C3A9              |
+-------------------+
1 row in set (0.00 sec)

checking i have the same value

test ]> select convert(binary(convert(convert(unhex('312E2058C3A3C281C2A6C3A3E2809AE284A2C3A6C593E280B0C3A5C290C28DC3A3C281C2AA') using utf8 ) using latin1 )) using utf8 );
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| convert(binary(convert(convert(unhex('312E2058C3A3C281C2A6C3A3E2809AE284A2C3A6C593E280B0C3A5C290C28DC3A3C281C2AA') using utf8 ) using latin1 )) using utf8 ) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1. Xで有名な                                                                                                                                          |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

by copy/paste the output string , i can reverse the process

test > select hex(convert (convert(binary('1. Xで有名な  ') using latin1 ) using utf8 )) ;
+---------------------------------------------------------------------------------+
| hex(convert (convert(binary('1. Xで有名な') using latin1 ) using utf8 )) |
+---------------------------------------------------------------------------------+
| 312E2058C3A3C281C2A6C3A3E2809AE284A2C3A6C593E280B0C3A5C290C28DC3A3C281C2AA      |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)

If you have a few rows to insert you insert your rows with pphpmyadmin and if it does not work directly via the command mysql .

If you want to use python you can use this module : https://pypi.org/project/mysql-latin1-codec/

Note: That first long CONVERT(...) can be used in an `UPDATE` for "fixing" the data in the table. This is an example of "double encoding". More discussion of that here: https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored . Do not use python conversions, use SQL. — Rick James, Aug 17 '18 at 23:32

recreate encoding mixup in mysql table

1 Answers1