0

I inherited a large database of cities around the world.

I noticed that some cities have garbled names in the placeName column.

For example:

Cité Administrative de l'Etat appears as Cité Administrative de l'Etat in the database.

The collation of the placeName column is utf8_general_ci and the collation of the table is utf8mb4_unicode_ci.

How can I find all the cities names in column placeName that do not contain letters that belong to any language's alphabet?

Once those are found, how can I automatically fix them so that, for example, Cité Administrative de l'Etat becomes Cité Administrative de l'Etat without having to do each one manually one by one?

ProgrammerGirl
  • 3,157
  • 7
  • 45
  • 82
  • 2
    Don't you have a way to re-INSERT these values in your DB? Because you'll have the same problem with éèàù ... It would be easier to set the correct encoding on your PHP script and then INSERT the values or update them. – Julien Lachal Jun 22 '17 at 14:20
  • 1
    You could try `INSERT ... SELECT CONVERT( ... USING utf8)` : https://dev.mysql.com/doc/refman/5.7/en/cast-functions.html - because *"Cité"* looks very much like what happens when you try and stuff UTF data into 8859-1 (UTF-8 has twice as many bits per character as Latin, so you get 2 incorrect characters for anything beyond ASCII) - get that sorted then you'll probably want to go through *the checklist* : https://stackoverflow.com/questions/279170/utf-8-all-the-way-through/279279 – CD001 Jun 22 '17 at 14:30
  • @JulienLachal I do not have the original data, so I'm stuck with the garbled data currently in the DB and that's all I have to work with unfortunately. – ProgrammerGirl Jun 22 '17 at 16:21
  • @CD001 Would the `CONVERT` function fix and convert `Cité Administrative de l'Etat` to `Cité Administrative de l'Etat` ? – ProgrammerGirl Jun 22 '17 at 16:22
  • Should do - I've had to do it a few times on old databases I've inherited; try just doing it as a `SELECT` initially to check, you could export that to a CSV then verify the data in say OpenOffice Calc to be on the safe side. – CD001 Jun 23 '17 at 08:06
  • @CD001 Unfortunately, it did not work: http://i.imgur.com/EZYYx06.png Any other ideas? – ProgrammerGirl Jun 23 '17 at 11:57
  • 1
    Try this: `SELECT CONVERT(CAST(CONVERT(\`placeName\` USING latin1) AS BINARY) USING utf8) AS placeName FROM \`YOUR_TABLE\`` ... I actually put something like that (along with some other stuff) into a MySQL Function so I wouldn't have to remember it ;) – CD001 Jun 23 '17 at 12:58
  • @CD001 That worked! Thanks so much. Please add it as a solution so I can accept it. – ProgrammerGirl Jun 23 '17 at 16:19

1 Answers1

0

Look for Mojibake in Trouble with utf8 characters; what I see is not what I stored to see what went wrong.

Then http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases to see how to repair the tables.

Rick James
  • 135,179
  • 13
  • 127
  • 222