4

I have just upgraded from mysql 5.5 to 5.6 and the default encoding in version 5.6 is UTF-8.

All ajax requests are handled in UTF-8 too, so I decided to convert data in the database and the encoding on the website to UTF-8.

But now I'm experiencing a problem.. All special (unicode) chars on the website are displayed incorrect.. They are displayed as encoded UTF-8 strings (double bytes)

The whole site is build in 100% jquery, and all strings are provided by ajax requests and appended/written with jquery... I can't figure out what I'm doing wrong?!

Database connection

$dbh = new PDO($driver.':dbname='.$db.';host='.$host.';port='.$port.';charset=utf8', $user, $pass);

All tables and columns in the database is converted to UTF-8_bin

HTML encoding

header('Content-Type: text/html; charset=utf-8');
<meta charset="utf-8" />

The HTML page is encoded in UTF-8. Is both sending a header from PHP and have added a meta tag in the head in the HTML document

Ajax requests

header('Content-Type: application/json; charset=utf-8');
{
    type : 'post',
    async : true,
    cache : false,
    dataType : 'json',
    timeout : 15000,
    contentType : 'application/x-www-form-urlencoded;charset=utf-8',
    global : true,
    url : APIURL,
    data : {},
    success : function(){}
}

All Ajax requests is made with these properties

Ajax response

{"result":[{"id":"391","string":"BTN_ADD_ACCOUNTS","da":"Tilf\u00c3\u00b8j regnskab","en":"Add accounts"},{"id":"321","string":"BTN_ADD_ENTRY" ...

The value Tilf\u00c3\u00b8j regnskab should be displayed as Tilføj regnskab on the page, but it is displayed as Tilføj regnskab

When browsing the database in phpmyadmin all data showed corretcly

I can put data in the database via Ajax requests and the data is stored correctly, but can't retrieve data via Ajax requests

update - db fetch

header('Content-Type: text/html; charset=utf-8');

$connect = $driver.':dbname='.$db.';host='.$host.';port='.$port.';charset=utf8';
$dbh = new PDO($connect, $user, $pass);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$sth = $dbh->prepare("SELECT da FROM lang WHERE string='HDL_CLIENT'");
$sth->execute();
$row = $sth->fetch(PDO::FETCH_ASSOC);
print_r($row);

// result
klient æå
Community
  • 1
  • 1
clarkk
  • 27,151
  • 72
  • 200
  • 340
  • Have you re-saved all your files as UTF-8? Are you generating Content-Type headers for UTF-8? – Álvaro González Apr 28 '13 at 19:23
  • If you mean the source files? That shouldn't have influence?! All data is provided from ajax requests and it's fetched from the database.. No data/strings are saved in the source files.. The files only contain code... Or am I wrong? :) – clarkk Apr 28 '13 at 19:28
  • Yes.. `header('Content-Type: text/html; charset=utf-8');` – clarkk Apr 28 '13 at 19:30
  • 1
    To be honest, there're too many steps involved so it's difficult to provide general advice. You can check the SO classic [UTF-8 all the way through](http://stackoverflow.com/questions/279170/utf-8-all-the-way-through) to see if you've omitted something and use your favourite browser toolset to inspect the HTTP requests. – Álvaro González Apr 28 '13 at 19:33
  • thanks for the link :) but can't still really make it work.. Just want to make one thing clear.. How should the ajax request results look like? Should the string be represented like this `Tilf\u00c3\u00b8j regnskab` – clarkk Apr 28 '13 at 20:56
  • Yes, that's how JSON looks like ([you can validate your JSON if unsure](http://jsonlint.com)). It has nothing to do with UTF-8, those are JavaScript escape sequences with Unicode code points. – Álvaro González Apr 28 '13 at 21:00
  • ok.. and you need to put `header('Content-Type: application/javascript; charset=utf-8');` on the request which returns the ajax results? – clarkk Apr 28 '13 at 21:05
  • You could try the following to ensure your ajax calls are set up correctly: `$.ajaxSetup({ scriptCharset: "utf-8" , contentType: "application/json; charset=utf-8"});` – mccannf Apr 28 '13 at 21:06
  • @clarkk - Browsers normally guess but it always helps, but you are not returning JavaScript code, you are returning JSON. The MIME type is `application/json`. – Álvaro González Apr 28 '13 at 21:09
  • @Alvaro, It works fine when sending `Tilføj regnskab` to the db via ajax.. The data is saved as `Tilføj regnskab` in the db, but when retrieving data from the db I can't get it the right way.. don't you have the magic answer to the problem? :) the ajax retrieving results seems to be right – clarkk Apr 28 '13 at 21:32
  • 1
    No, `Tilf\u00c3\u00b8j regnskab` is *not* what you want! Instead, you want `Tilf\u00f8j regnskab`. – icktoofay Apr 28 '13 at 21:48
  • Browsers guess *only* unless webserver sends the encoding. Thanks to an old IE bug, the encoding on the page is ignored. But in this case you're sending utf IN json. So I think in this case the encoding is wrong in the database (or when fetched from database by the library). – Paolo Casciello Apr 29 '13 at 08:19
  • What are you using to serve the pages? It needs to be configured to serve UTF-8 content too (meta tag is not enough). – heikkim Apr 29 '13 at 08:20
  • Is the data in the database actually correct? Sounds like you may have strings with a messed up encoding stored in the database which comes to light now as you may have switched connection encodings. See [Handling Unicode Front To Back In A Web App](http://kunststube.net/frontback/) for a low-level explanation of what may have happened. – deceze Apr 29 '13 at 08:22
  • @heikkim since he sends JSON serialized data (which is encoding independent), it's not important the server encoding in this case. The js json parser sees it's unicode and decodes correctly. I still think the problem is at the source. – Paolo Casciello Apr 29 '13 at 08:24
  • When browsing the database in phpmyadmin all data is showed correctly – clarkk Apr 29 '13 at 08:25
  • You're *not* running the data returned from the database through `utf8_encode` or some such, are you? – deceze Apr 29 '13 at 08:27
  • I can put data to the database via Ajax requests and the data is stored correctly, but can't retrieve data via Ajax requests – clarkk Apr 29 '13 at 08:28

1 Answers1

1

I hadn't noticed the important bit. As icktoofay points out, your JSON is already corrupted:

>>> "Tilf\u00c3\u00b8j regnskab"
"Tilføj regnskab"

You have two characters (\u00c3\u00b8) to represent ø (which is obviously one character). Since ø is U+00F8 it should be just \u00f8:

>>> "Tilf\u00f8j regnskab"
"Tilføj regnskab"

The problem is in the PHP script that generates that JSON, not in the JavaScript code that consumes it. Possibilities:

  • Your data is already corrupted at DB.
  • Your data gets corrupted on fetch.

The UTF-8 representation for ø is "0xC3 0xB8 (c3b8)". To need to ensure that you see those bytes. To print hexadecimal values:

  • MySQL - SELECT HEX(column_name) FROM table_name
  • PHP - var_dump( bin2hex($variable) );
Álvaro González
  • 142,137
  • 41
  • 261
  • 360
  • ok.. :) but when browsing the database in phpmyadmin all special chars is displayed correcty.. – clarkk Apr 29 '13 at 09:44
  • I have now made a little test.. Check my question update.. Have pasted the php code which fetch data from the db – clarkk Apr 29 '13 at 10:45
  • @clarkk - 1) What's your PHP version? 2) What does `var_dump( bin2hex($row) );` print? – Álvaro González Apr 29 '13 at 10:51
  • is using php 5.4.3.. result `string(22) "4b6c69656e7420c3a6c3a5" ` ... plain text string `Klient æå` – clarkk Apr 29 '13 at 10:54
  • Everything suggests that your database does not have `ø` stored: it has `æ` plus `å`. If you run `SELECT HEX(column_name) FROM table_name` you'll confirm it. – Álvaro González Apr 29 '13 at 11:01
  • query: `SELECT HEX(da), da FROM lang WHERE string='HDL_CLIENT'`result: `array(2) { ["HEX(da)"]=> string(22) "4B6C69656E7420C3A6C3A5" ["da"]=> string(11) "Klient æå" }` – clarkk Apr 29 '13 at 11:06
  • ok.. so the problem is that all results fetched from the DB is not UTF8.. but the UTF8 charset is defined in the PDO connection `$connect = $driver.':dbname='.$db.';host='.$host.';port='.$port.';charset=utf8';` – clarkk Apr 29 '13 at 11:12
  • 1
    Confirmed. In UTF-8, æ is `c3a6` and å is c3a5. There you are both. Your table does not have any ø at all. Everything looks 100% correct UTF-8. It only doesn't have the data you think. Now, how did that invalid data get inserted in the first place? Why does phpMyAdmin display it incorrectly? Can you reproduce the wrong insert? – Álvaro González Apr 29 '13 at 11:13
  • It not about if it is ø, ø or å.. The problem is that the data passed to JSON is not UTF8 even if the tables are – clarkk Apr 29 '13 at 11:17
  • It's possible that when you converted your table from Latin-1 to UTF-8 you didn't change the data itself. But it's alright, if you prefer to somehow fix it right before composing your JSON, give it a try. Just ensure that new data is carefully corrupted on insert to match existing data so it can be de-corrupted before `json_encode()`. – Álvaro González Apr 29 '13 at 11:22
  • I don't get it.. `Klient æå` is inserted from phpmyadmin!? So it should have the correct encoding – clarkk Apr 29 '13 at 11:28
  • The [ø character](http://www.fileformat.info/info/unicode/char/f8/index.htm) is represented as `c3b8` in UTF-8. If you don't have `c3b8` in your table, you don't have ø. Period. It's as simple as that. You can blame it on JSON or claim that PhpMyAdmin cannot be wrong but you do not have a ø character. – Álvaro González Apr 29 '13 at 11:32
  • why do you continue talking about `ø`? the original example was about `Tilføj regnskab` and the current example is about `Klient æå` – clarkk Apr 29 '13 at 11:34
  • Please clarify. Do you mean that real data must be "æå"? The *should be displayed as Tilføj regnskab on the page* part of your question mislead me. – Álvaro González Apr 29 '13 at 11:48
  • sorry my bad.. found out that I forgot to remove a `utf8_encode` :) it works now.. thanks for your help – clarkk Apr 29 '13 at 11:58