8

I'm selecting some data from database and encoding them as json, but I've got a problem with czech signs like

á,í,ř,č,ž...

My file is in utf-8 encoding, my database is also in utf-8 encoding, I've set header to utf-8 encoding as well. What else should I do please?

My code:

header('Content-Type: text/html; charset=utf-8');
while($tmprow = mysqli_fetch_array($result)) {
        $row['user'] = mb_convert_encoding($tmprow['user'], "UTF-8", "auto");
        $row['package'] = mb_convert_encoding($tmprow['package'], "UTF-8", "auto");
        $row['url'] = mb_convert_encoding($tmprow['url'], "UTF-8", "auto");
        $row['rating'] = mb_convert_encoding($tmprow['rating'], "UTF-8", "auto");

        array_push($response, $row);
    }

    $json = json_encode($response, JSON_UNESCAPED_UNICODE);

    if(!$json) {
        echo "error";
    }

and part of the printed json: "package":"zv???tkanalouce"

EDIT: Without mb_convert_encoding() function the printed string is empty and "error" is printed.

Machavity
  • 30,841
  • 27
  • 92
  • 100
Jakub Turcovsky
  • 2,096
  • 4
  • 30
  • 41
  • 2
    If you want to output JSON, do not set `Content-Type: text/html` in your header, use `Content-Type: application/json`. – Holt Apr 27 '14 at 10:40
  • Just tell your database driver with that connection you're expecting UTF-8 encoded string values, throw away the mb_convert_encoding you've plugged in there (there is no "auto" with encoding, you either know what you do or you're shooting in your own feets, there is no computer who can take over that for you) and you should be fine. – hakre Apr 27 '14 at 10:41
  • What do you expect `mb_convert_encoding($tmprow['rating'], "UTF-8", "auto")` to do? Why is it necessary? What stands the `"auto"` encoding parameter for? Does it mean you don't know which encoding your strings have and in which you need them to be? Please share. – hakre Apr 27 '14 at 10:42
  • @Holt It looks nicer now, but those chars are still appearing like '?', but thanks – Jakub Turcovsky Apr 27 '14 at 10:47
  • @hakre "auto" should stand for automatic detection of input encoding. – Jakub Turcovsky Apr 27 '14 at 10:48
  • @2rec: Important info: There is no such thing of automatic encoding detection. It's always a guess, do not rely to it. Especially not in cases where you run into trouble. Then it's time to throw such things out and find out about the actual encodings and control those. Drop the detection. Do you know how to specify the encoding for the database connection? – hakre Apr 27 '14 at 10:50
  • @hakre If I remove mb_convert_encoding function, it's printing an empty string. – Jakub Turcovsky Apr 27 '14 at 10:50
  • @2rec: `json_encode` does not always return a string. In case of an error, it returns boolean FALSE which when echo'ed is an empty string. You need to check the return value first and deal with the error case. That's required for stable code. – hakre Apr 27 '14 at 10:52
  • @hakre I don't know how to specify that, could you give me some function I could search for please? – Jakub Turcovsky Apr 27 '14 at 10:53
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/51529/discussion-between-2rec-and-hakre) – Jakub Turcovsky Apr 27 '14 at 10:56

1 Answers1

16

With the code you've got in your example, the output is:

json_encode($response, JSON_UNESCAPED_UNICODE);
"package":"zv???tkanalouce"

You see the question marks in there because they have been introduced by mb_convert_encoding. This happens when you use encoding detection ("auto" as third parameter) and that encoding detection is not able to handle a character in the input, replacing it with a question mark. Exemplary line of code:

$row['url'] = mb_convert_encoding($tmprow['url'], "UTF-8", "auto");

This also means that the data coming out of your database is not UTF-8 encoded because mb_convert_encoding($buffer, 'UTF-8', 'auto'); does not introduce question marks if $buffer is UTF-8 encoded.

Therefore you need to find out which charset is used in your database connection because the database driver will convert strings into the encoding of the connection.

Most easy is that you just tell per that database link that you're asking for UTF-8 strings and then just use them:

$mysqli = new mysqli("localhost", "my_user", "my_password", "test");

/* check connection */
if (mysqli_connect_errno()) {
    printf("Connect failed: %s\n", mysqli_connect_error());
    exit();
}

/* change character set to utf8 */
if (!$mysqli->set_charset("utf8")) {
    printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
    printf("Current character set: %s\n", $mysqli->character_set_name());
}

The previous code example just shows how to set the default client character set to UTF-8 with mysqli. It has been taken from the manual, see as well the material we have on site about that, e.g. utf 8 - PHP and MySQLi UTF8.

You can then greatly improve your code:

$response = $result->fetch_all(MYSQLI_ASSOC);

$json = json_encode($response, JSON_UNESCAPED_UNICODE);

if (FALSE === $json) {
    throw new LogicException(
        sprintf('Not json: %d - %s', json_last_error(), json_last_error_msg())
    );
}

header('Content-Type: application/json'); 
echo $json;
Community
  • 1
  • 1
hakre
  • 193,403
  • 52
  • 435
  • 836