21
$test = json_encode('بسم الله');
echo $test;

As a result of this code, the output is: "\u0628\u0633\u0645 \u0627\u0644\u0644\u0647" while it should be something like "بسم الله". Arabic Characters are encoded when being JSON encoded while at the Youtube API this is not the case: http://gdata.youtube.com/feeds/api/videos/RqMxTnTZeNE?v=2&alt=json

You can see at Youtube that Arabic characters are displayed properly. What could be my mistake?

HINT: I'm working on an API< the example is just for the sake of clarification.

Mohamed Said
  • 4,413
  • 6
  • 35
  • 52

4 Answers4

41

"\u0628\u0633\u0645 \u0627\u0644\u0644\u0647" and "بسم الله" are equivalent in JSON.

PHP just defaults to using Unicode escapes instead of literals for multibyte characters.

You can specify otherwise with JSON_UNESCAPED_UNICODE (providing you are using PHP 5.4 or later).

json_encode('بسم الله', JSON_UNESCAPED_UNICODE);
Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
2

That is the correct JSON encoded version of the UTF-8 string. There is no need to change it, it represents the correct string. Characters in JSON can be escaped this way.

JSON can represent UTF-8 characters directly if you want to. Since PHP 5.4 you have the option to set the JSON_UNESCAPED_UNICODE flag to produce raw UTF-8 strings:

json_encode($string, JSON_UNESCAPED_UNICODE)

But that is only a preference, it is not necessary.

deceze
  • 510,633
  • 85
  • 743
  • 889
2

Both formats are valid and equivalent JSON strings:

char
    any-Unicode-character-
        except-"-or-\-or-
        control-character
    \"
    \\
    \/
    \b
    \f
    \n
    \r
    \t
    \u four-hex-digits

If you prefer the unencoded version, simply add the JSON_UNESCAPED_UNICODE flag:

<?php

$test = json_encode('بسم الله', JSON_UNESCAPED_UNICODE);
echo $test;

This flag requires PHP/5.4.0 or greater.

Álvaro González
  • 142,137
  • 41
  • 261
  • 360
2

Well, as mentioned before it doesn't matter, since both strings are equivalent. What you SHOULD do however is make sure that the encoded string is decoded before it's send to an output.

echo json_decode($test);

Or because JSON contain most likely more than just a single string:

$obj['arabic'] = 'بسم الله';
$obj['latin'] = 'abcdef';
$obj['integer'] = 12345;

$test = json_encode($obj);

$testobject = json_decode($test);
echo $testobject['arabic'];
itsid
  • 801
  • 7
  • 16