59

Only today I realized that I was missing this in my PHP scripts:

mysql_set_charset('utf8');

All my tables are InnoDB, collation "utf8_unicode_ci", and all my VARCHAR columns are "utf8_unicode_ci" as well. I have mb_internal_encoding('UTF-8'); on my PHP scripts, and all my PHP files are encoded as UTF-8.

So, until now, every time I "INSERT" something with diacritics, example:

mysql_query('INSERT INTO `table` SET `name`="Jáuò Iñe"');

The 'name' contents would be, in this case: Jáuò Iñe.

Since I fixed the charset between PHP and MySQL, new INSERTs are now storing correctly. However, I want to fix all the older rows that are "messed" at the moment. I tried many things already, but it always breaks the strings on the first "illegal" character. Here is my current code:

$m = mysql_real_escape_string('¿<?php echo "¬<b>\'PHP &aacute; (á)ţăriîş </b>"; ?> ă-ţi abcdd;//;ñç´พดแทฝใจคçăâξβψδπλξξςαยนñ ;');
mysql_set_charset('utf8');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('latin1');
mysql_query('INSERT INTO `table` SET `name`="'.$m.'"');
mysql_set_charset('utf8');

$result = mysql_iquery('SELECT * FROM `table`');
while ($row = mysql_fetch_assoc($result)) {
    $message = $row['name'];
    $message = mb_convert_encoding($message, 'ISO-8859-15', 'UTF-8');
    //$message = iconv("UTF-8", "ISO-8859-1//IGNORE", $message);
    mysql_iquery('UPDATE `table` SET `name`="'.mysql_real_escape_string($message).'" WHERE `a1`="'.$row['a1'].'"');
}

It "UPDATE"s with the expected characters, except that the string gets truncated after the character "ă". I mean, that character and following chars are not included on the string.

Also, testing with the "iconv()" (that is commented on the code) does the same, even with //IGNORE and //TRANSLIT

I also tested several charsets, between ISO-8859-1 and ISO-8859-15.

Rick James
  • 135,179
  • 13
  • 127
  • 222
Nuno
  • 3,082
  • 5
  • 38
  • 58
  • Your `Jáuò Iñe` should have been `Jáuò Iñe`, correct? The 2-step ALTER _may_ be appropriate (and relatively simple). See https://stackoverflow.com/a/71960959/1766831 – Rick James Apr 21 '22 at 21:17
  • Rick - I appreciate your help, but this question was asked in 2012, I got an answer on the same day that saved my day (and still grateful for it!), you edited my question in 2020, and 2 years later you're suggesting an answer. That's very funny :D :-) -- good day to you, enjoy your day! – Nuno Apr 22 '22 at 22:04

4 Answers4

154

From what you describe, it seems you have UTF-8 data that was originally stored as Latin-1 and then not converted correctly to UTF-8. The data is recoverable; you'll need a MySQL function like

convert(cast(convert(name using  latin1) as binary) using utf8)

It's possible that you may need to omit the inner conversion, depending on how the data was altered during the encoding conversion.

Community
  • 1
  • 1
ABS
  • 2,092
  • 1
  • 13
  • 6
  • 11
    WOW you saved my day! I never used those functions, used now on an UPDATE and it worked. Thank you VERY MUCH! – Nuno Feb 23 '12 at 06:11
  • 2
    Thanks for this! I've build a small php script that loops through all columns in each table. Did the trick :) – wiesson Nov 18 '14 at 14:27
  • THANKS A LOT ABS!!!!! A lot of related SO questions but only this one contained the function to convert correctly to UTF-8 – alds Aug 12 '15 at 12:24
  • your answer helped for my fixes. Thanks – Maniprakash Chinnasamy Sep 16 '15 at 08:13
  • 1
    After more than 5 years, this great answer saved me from lot of heart aches. you are simply brilliant. However, I have a question: if there are many columns to convert, will a wildcard work instead of name? – Stephen Adelakun Jun 10 '17 at 23:00
  • I wrote [a little Ruby script](https://gist.github.com/scotchi/0823048d923ab506db858dd5e90a5dc9) to do this for all Wordpress tables. – scotchi Apr 08 '18 at 18:33
  • 1
    Caution: There are many different causes of messed up utf8. That expression works for one case. Here is a list of cases that I have identified: http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases – Rick James Feb 29 '20 at 06:43
  • Is there a way to achieve the exact same with php? – obey Dec 15 '20 at 18:09
  • This give me some problem with "new" data, already stored in utf8. – Gianluca Demarinis Aug 17 '21 at 15:15
  • It's still useful in 2021. Thanks. If someone needs to run the Ruby script in Rails (as rake task) simply omit the creation of the client replace `client.query` with `ActiveRecord::Base.connection.execute` and it should work. – Sig Oct 15 '21 at 08:34
  • In 2022 after updating from really old debian and basically websites which were created using php5, this was the most helpful answer. Thanks! – Crt Mori Nov 15 '22 at 20:03
  • 1
    **CAUTION** don't run the above function on data that is *already* in UTF8. It will **nuke** everything after the first already-UTF8 character. You can witness this by running it over the same field twice. – UncaAlby Jan 17 '23 at 16:27
  • Thank you! This fixed an issue I was having. – Andrew Brēza Feb 02 '23 at 19:51
35

After I searched about an hour or two for this answer, I needed to migrate an old tt_news db from typo into a new typo3 version. I tried to convert the charset in the export file and import it back already, but didn't get it working.

Then I tried the answer above from ABS and started an update on the table:

UPDATE tt_news SET 
    title=convert(cast(convert(title using  latin1) as binary) using utf8), 
    short=convert(cast(convert(short using  latin1) as binary) using utf8), 
    bodytext=convert(cast(convert(bodytext using  latin1) as binary) using utf8)
WHERE 1

You can also convert imagecaption, imagealttext, imagetitletext and keywords if needed. Hope this will help somebody migrating tt_news to new typo3 version.

happymacarts
  • 2,547
  • 1
  • 25
  • 33
Marcel Grolms
  • 418
  • 4
  • 6
0

the way is better way use connection tow you database normal

then use this code to make what you need you must make your page encoding utf-8 by meta in header cod html (dont forget this)

then use this code

    $result = mysql_query('SELECT * FROM shops');
    while ($row = mysql_fetch_assoc($ 
    $name= iconv("windows-1256", "UTF-8", $row['name']);

   mysql_query("SET NAMES 'utf8'"); 
   mysql_query("update   `shops` SET `name`='".$name."'  where ID='$row[ID]'  ");
    }
hussien
  • 9
  • 1
0

I highly recommend using 'utf8mb4' instead of 'utf8', since utf8 cannot store some chinese characters and emojis.

burkul
  • 113
  • 1
  • 5