2

I have a string that receives an XML structure. One of the elements contains Chinese characters. In order to covert the XML to json, I use json_encode(). The output for the Chinese characters is garbled.

I tried checking the encoding with mb_detect_encoding and even tried the solution listed here.

I've googled around (a lot) and found numerous other resources but none of them seems to solve my problem. Any help is much appreciated.

Code:

<?php
$str = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<rootjson>
  <widget>
    <debug>on</debug>
    <text>
      <data>點擊這裡</data>
      <size>36</size>
      <alignment>center</alignment>
    </text>
  </widget>
</rootjson>
XML;

$xml = simplexml_load_string($str);
if ($encoding = mb_detect_encoding($xml, 'UTF-8', true)) echo 'XML is utf8';  //It finds it to be utf8
$json = json_encode($xml, JSON_PRETTY_PRINT);
if ($encoding = mb_detect_encoding($json, 'UTF-8', true)) echo 'Json is utf8';  //It also finds it to be utf8
var_dump($json);
?>

Output:

{
    "widget": {
        "debug": "on",
        "text": {
            "data": "\u9ede\u64ca\u9019\u88e1",
            "size": "36",
            "alignment": "center"
        }
    }
}

I don't think I can trust the mb_detect_encoding here as it is telling that both $xml and $json are UTF-8 encoded. The Chinese string 點擊這裡 is now showing as

\u9ede\u64ca\u9019\u88e1

.

Community
  • 1
  • 1
Paulo Hgo
  • 834
  • 1
  • 11
  • 26
  • Read the documentation: http://php.net/manual/en/function.json-encode.php. What you need is JSON_UNESCAPED_UNICODE – Mihai Nita Jan 28 '17 at 03:57
  • You're so right!! Thanks a lot, I looked at the manual and didn't realize that. If you'd like to formally answer the questions I can vote it up and mark it. Thanks again! – Paulo Hgo Jan 28 '17 at 05:35

1 Answers1

2

What you need is JSON_UNESCAPED_UNICODE, see the documentation at php.net/manual/en/function.json-encode.php

Mihai Nita
  • 5,547
  • 27
  • 27