1

I am not sure how my error is termed, but I think my data show hex codes rather than the actual foreign characters.

To be more precise, I have a MySQL database with data like:

  • G<e9>rard instead of Gérard, or
  • M<fc>nster instead of Münster.

Apparently my columns have an utf8_unicode_ci-encoding (according to phpMyAdmin).

Now I wish to convert strings like <e9> into é, either directly in the MySQL-database, or in PHP when the output is being shown.

Apparently others were able to use this response to convert their MySQL-table successfully;

UPDATE db_name SET 
    column1=convert(cast(convert(column1 using latin1) as binary) using utf8), 
    column2=convert(cast(convert(column2 using  latin1) as binary) using utf8)

However, this doesn't change anything in my case.

So how can I achieve the conversion?

Thank you!

anpami
  • 760
  • 5
  • 17
  • It's not clear if you mean your data `Grard` has a single byte E9 as the second byte, or does it literally have four printable characters ``? That's going to be much trickier to convert using SQL. – Bill Karwin Mar 15 '21 at 16:33
  • @BillKarwin, I fear these are four printable characters. Is it "tricky", or is it "impossible"? – anpami Mar 15 '21 at 18:57

1 Answers1

1

Here's how I would fix this if the special characters are actually sequences of four characters.

First, make sure the table is all converted to utf8mb4:

ALTER TABLE mytable CONVERT TO CHARACTER SET utf8mb4;

Use the REPLACE() function to fix each character one by one.

UPDATE mytable SET 
  column1 = REPLACE(column1, '<e9>', 'é'),
  column2 = REPLACE(column2, '<e9>', 'é');

Be careful if you're editing this SQL query by copy & paste. Make sure you fix column2 in both the left side and right side of the =. Otherwise if you forget one, you could copy the content of column1 into column2, and lose the old content of column2.

Once you're done with é, then do a similar statement for ü:

UPDATE mytable SET 
  column1 = REPLACE(column1, '<fc>', 'ü'),
  column2 = REPLACE(column2, '<fc>', 'ü');

Gradually you will clean up all these hex sequences. You can search the table to see if you have any remaining:

SELECT DISTINCT REGEXP_SUBSTR(column1, '<[[:xdigit:]]{2}>') FROM mytable 
WHERE REGEXP_LIKE(column1, '<[[:xdigit:]]{2}>');

(MySQL 8.0 is required for REGEXP_SUBSTR())

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828