1

I'm trying to convert an array with values in brazilian portuguese to JSON.

Here is an array example:

array(1) {
  ["title"]=>
  string(77) "Cartão Credicard Universitário Visa Crédito "
}

If I use mb_detect_encoding it shows that all values and keys are either in ASCII or UTF8.

However if I try to use json_encodein order to generate the json, it returns a false and json_last_error function says that the error is JSON_ERROR_UTF8

But if I apply first the utf8_encode_deep function to the array ( http://php.net/manual/es/function.utf8-encode.php ), the json is generated without giving any errors.

The problem with this solution is that it returns certain words with bad codification.

Example:

Word before applying utf8_encode: Cartão (good codification)

Word after applying utf8_encode: Cartão (bad codification)

So although it generates the JSON, it doesn't solve my problem because it messes up the words.

Here is the code I'm using:

try {
  $dbh = new PDO("mysql:host=$hostname;dbname=$database;", $username, $password);
  $sql = "SELECT title FROM card";
  $stmt = $dbh->query($sql);

  $result = $stmt->fetch(PDO::FETCH_ASSOC);
  $json = $json_encode($result);
  $error = json_last_error();

  var_dump($json, $error === JSON_ERROR_UTF8);
} catch (PDOException $e) {
        echo 'Connection failed: ' . $e->getMessage() . '\n';
}

If I try to connect to the database using charset=utf8 or charset=utf8mb4, it retrieves Cartão(bad codification), instead of Cartão (good codification)

I have also tried to use JSON_UNESCAPED_UNICODE as parameter of json_encode, but the result remains the same as without using this parameter.

Any suggestions?

UPDATE: I've simplified the example with one concrete case where this problem is happening.

UPDATE 2: Added some code in order to clarify the example, also added some explanations about possible solutions in the comments.

rfc1484
  • 9,441
  • 16
  • 72
  • 123
  • Well, where are the values coming from? Can you narrow it down to one specific value that's causing the issue? Once you've narrowed it down, do `bin2hex($value)` on that value to see its bytes. Check an encoding table if those bytes are correct for UTF-8 for the characters you expect. – deceze Jul 10 '14 at 07:35
  • The values are coming from a mysql query where the database and table character set are utf8 and collation is utf8_general_ci. The specific problem seems to happen only with the vocals with tildes: http://en.wikipedia.org/wiki/Tilde (as in the example shown in my question) – rfc1484 Jul 10 '14 at 07:43
  • Tried http://stackoverflow.com/questions/279170/utf-8-all-the-way-through? – deceze Jul 10 '14 at 07:46
  • possible duplicate of [Why does the PHP json\_encode function convert UTF-8 strings to hexadecimal entities?](http://stackoverflow.com/questions/16498286/why-does-the-php-json-encode-function-convert-utf-8-strings-to-hexadecimal-entit) – Sergiu Paraschiv Jul 10 '14 at 07:56
  • Definitely a duplicate of http://stackoverflow.com/questions/279170/utf-8-all-the-way-through – deceze Jul 10 '14 at 08:40

1 Answers1

5

"If I try to connect to the database using charset=utf8 or charset=utf8mb4, it retrieves Cartão(bad codification), instead of Cartão (good codification)"

You are using latin1 as the display encoding, so that UTF-8 encoded, correct, text is displayed incorrectly.

Add charset=utf8 to the connection string and also set the response charset to UTF-8:

header('Content-Type: text/html;charset=utf-8');
Joni
  • 108,737
  • 14
  • 143
  • 193