0

I've got a (sign) UTF8 problem. This is what's going on:

  • The database is over 6 or 7 years old. I'm bringing the site back to life but I'm having bad luck with UTF8 encoding. Edit EDIT: After having another stab I managed to make most of the characters work, for instance Grüßgott now works. However I am still get black diamonds on curly single and double quotes.

  • I've done a bunch of stackoverflow searching and tried a lot of things.

  • The local dev database I'm working on is setup with UTF8 and utf8_general_ci. I've manually converted each table from latin1_general_ci_as to utf8_general_ci and changed the chartset to UTF8 as well. Some of them didn't work with a query so I manually went to table design and then that seemed to fix it. I realise some table COLUMNS can (and in my care) are set to latin1_general_ci_as, but I've doing a test case with 1 specific piece of code, and that (blog) is definitely UTF8.

  • The database connection is setup as follows:

    $connection = mysql_connect($DB_SERVER, $DB_USER, $DB_PASS);
    mysql_query("SET character_set_results=utf8", $connection);
    mb_language('uni'); 
    mb_internal_encoding('UTF-8');
    mysql_select_db($DB_NAME, $connection);
    mysql_query("SET names 'utf8'",$connection);
    
  • The site is procedual PHP. In the functions.php file it selects the data from the database, as an example see below. The only reason I'm show this is in my readings I saw some people saying you need to set something every single time, but I'm really hoping you don't need to declare something every time you want to grab something from the database.

    query("SELECT * FROM table_name WHERE data='$something[unique]'");
    
  • The site is HTML5, and I have this in the header.php. As far as I can tell this is all okay.

    <?
    header('Content-Type: text/html; charset=utf-8');
    ?>
    <!DOCTYPE html>
    <html>
    <head>
        <meta charset="utf-8">
    

So, that's where I am. I would really like to solve the problem once and for all and not have to worry about it again. Any help or ideas is greatly appreciated.

Maurice
  • 468
  • 6
  • 13
  • If you have columns that are `latin1_general_ci_as` then I'd expect they would need to be converted at the database level too. – halfer Nov 14 '13 at 16:16
  • 1
    Why do you use `mysql_*` and `mysqli_` functions? You cannot mix them! The former are also deprecated and will be removed in near future! – ComFreek Nov 14 '13 at 16:17
  • 1
    1) Have you followed everything here: http://stackoverflow.com/questions/279170/utf-8-all-the-way-through ? 2) Do you understand everything here: [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) ? 3) Is the data displayed correctly in the database using an admin tool that gets the encoding right (hopefully)? – deceze Nov 14 '13 at 16:19
  • Also, check that your output pages are UTF-8 too. – halfer Nov 14 '13 at 16:19
  • One thing is missing: Tell either PHP or yoiur webserver to send the appropriate content-type header along with the response. It should not only include the MIME type (which most likely is text/html), but also the encoding! Header information does always override meta-tags. – Johannes H. Nov 14 '13 at 16:26
  • @Bartdude 1. & 2. I read through both of them but no I did not 100% understand it, I'm by no means an expert at all. 3. Yeah I use NaviCat and I can see all the characters perfectly. I just updated my post to give an example. Thanks :) – Maurice Nov 14 '13 at 16:43
  • @ComFreek Whoops well spotted I had it briefly while troubleshooting (trial and error you might callit :( ). Edited. – Maurice Nov 14 '13 at 16:44
  • @JohannesH. What do you mean by tell PHP or the web server? I've already tried including header('Content-Type: text/html; charset=utf-8'); and it made no difference. – Maurice Nov 14 '13 at 16:46
  • @halfer Not sure what you mean, the whole site runs off a single header include so all pages get all the code examples I provided. > – Maurice Nov 14 '13 at 16:48
  • @Maurice: BUt that was exactly what I was talking about. Keep it in there (and check if it's really sent, it may be overridden somewhere later on in the chain). Even if it'S not the cause of your trouble, you should not miss it. – Johannes H. Nov 14 '13 at 16:52
  • 1
    @Maurice: if you're referring to my message "check that your output pages are UTF-8 too", I wrote that before spotting that your last PHP snippet (with the `header`) was not rendering in your question (it needed to be indented to show as code - now fixed). That looks fine, so check your browser to make sure it actually _is_ rendering as UTF-8 - sometimes problems with malformed HTML can revert it to quirks mode, which may use another encoding. – halfer Nov 14 '13 at 16:52
  • @halfer No problems all good :). I've tried it in all my browsers, get the same issue in all of them :(. – Maurice Nov 14 '13 at 16:55
  • 1
    Sure, but it is a good idea to _confirm_ that your browser is reading it correctly. In Firefox for example, use right-click and View Page Info. It will tell you what character set it is using under Encoding. – halfer Nov 14 '13 at 16:57
  • @halfer Awesome idea's. I added the code back and and boom, it fixed 95% of the encoding issues. All special characters work, so now it's only curly brackets that seem to be throwing the black diamonds. Also, Firefox tells me it's UTF-8, thanks for the tip :) – Maurice Nov 14 '13 at 17:04

1 Answers1

0

You can try mysqldump to convert from ISO-8859-1 to utf-8:

mysqldump --user=username --password=password --default-character-set=latin1 --skip-set-charset dbname > dump.sql
chgrep latin1 utf8 dump.sql (or when you have.   sed -i "" 's/latin1/utf8/g' dump.sql) 
mysql --user=username --password=password --execute="DROP DATABASE dbname; CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"
mysql --user=username --password=password --default-character-set=utf8 dbname < dump.sql
Micromega
  • 12,486
  • 7
  • 35
  • 72