6

I am having trouble JSON-encoding special characters. These characters display normally on my computer, in Notepad, in browsers, and even in my database. However, they do not JSON encode. An example is as follows:

<?
$array['copyright_str'] = "Copyright site.com © 2011-2012";
echo json_encode($array);
?>

The copyright symbol after site.com is what is making the JSON string echo as {"copyright_str":null}. While this is simple, I have users inputting profile data into a database which can be anything. When one of these funky characters shows up it breaks things. What is a good solution to this issue? The API I coded relies heavily on returning data from the database and printing strings in general as JSON.

My Multibyte settings are as follows:

     php -e phpinfo.php  | grep mb
    Configure Command =>  './configure'  '--enable-bcmath' '--enable-calendar' '--enable-dbase' '--enable-exif' '--enable-ftp' '--enable-gd-native-ttf' '--enable-libxml' '--enable-magic-quotes' '--enable-mbstring' '--enable-pdo=shared' '--enable-sockets' '--enable-zip' '--prefix=/usr/local' '--with-apxs2=/usr/local/apache/bin/apxs' '--with-bz2' '--with-curl=/opt/curlssl/' '--with-curlwrappers' '--with-freetype-dir=/usr' '--with-gd' '--with-imap=/opt/php_with_imap_client/' '--with-imap-ssl=/usr' '--with-jpeg-dir=/usr' '--with-kerberos' '--with-libdir=lib64' '--with-libexpat-dir=/usr' '--with-libxml-dir=/opt/xml2/' '--with-mcrypt=/opt/libmcrypt/' '--with-mhash=/opt/mhash/' '--with-mysql=/usr' '--with-mysql-sock=/var/lib/mysql/mysql.sock' '--with-mysqli=/usr/bin/mysql_config' '--with-openssl=/usr' '--with-openssl-dir=/usr' '--with-pcre-regex=/opt/pcre' '--with-pdo-mysql=shared' '--with-pdo-sqlite=shared' '--with-pic' '--with-png-dir=/usr' '--with-sqlite=shared' '--with-ttf' '--with-xmlrpc' '--with-xpm-dir=/usr' '--with-zlib' '--with-zlib-dir=/usr'
    xmlrpc_error_number => 0 => 0
    mbstring
    Multibyte string engine => libmbfl
    mbstring extension makes use of "streamable kanji code filter and converter", which is distributed under the GNU Lesser General Public License version 2.1.
    mbstring.detect_order => no value => no value
    mbstring.encoding_translation => Off => Off
    mbstring.func_overload => 0 => 0
    mbstring.http_input => pass => pass
    mbstring.http_output => pass => pass
    mbstring.internal_encoding => no value => no value
    mbstring.language => neutral => neutral
    mbstring.strict_detection => Off => Off
    mbstring.substitute_character => no value => no value

I'd like to avoid saving things like &copy;. Some of this data is going to be stored as plain text.

TRiG
  • 10,148
  • 7
  • 57
  • 107
user974896
  • 1,795
  • 4
  • 28
  • 48
  • Is PHP compiled for Unicode/MB? And, furthermore, does `json_encode` work correctly on Unicode/MB? –  Mar 15 '12 at 17:48
  • 4
    @IbrahimAzharArmar There are many Unicode characters that *have no ASCII equivalent*. –  Mar 15 '12 at 17:50
  • This post http://stackoverflow.com/questions/6058450/problem-json-encode-utf-8 seems to have a solution, although it doesn't strike me as being the "right" solution. It does seem to *require* UTF-8 or it may *silently result in null* http://stackoverflow.com/questions/1972006/json-encode-is-returning-null and http://stackoverflow.com/questions/7938387/json-encode-php-result-is-null (another failed design choice :-/) –  Mar 15 '12 at 17:59

3 Answers3

12

encode data in UTF-8 format before passing it to json_encode function

<?
    $array['copyright_str'] = utf8_encode("Copyright site.com © 2011-2012");
    echo json_encode($array);
?>
Saket Patel
  • 6,573
  • 1
  • 27
  • 36
  • 2
    +1 however this does assume that you're storing and handling all your data as ISO-8859-1, which means your app won't support Unicode characters outside of that one encoding. In the long term you are better off completely migrating to UTF-8. – bobince Mar 15 '12 at 23:08
  • in that case you can use mb_detect_encoding to check current data is in which format and then convert it to UTF-8 using mb_convert_encoding – Saket Patel Mar 16 '12 at 07:37
  • 2
    Well... bearing in mind that `mb_detect_encoding` only ever an approximate guess that could easily be wrong, yes. – bobince Mar 16 '12 at 16:58
3

I'm encoding data with tons of UTF-8 symbols with

json_encode($return, JSON_UNESCAPED_UNICODE)

and it works well. I use it to encode all kinds of languages: Arabic, Chinese, Thai, Lithuanian, German, French, Spanish, etc. All those have different unique symbols. Oh, I haven't tried encoding snowmen ☃ :)

TRiG
  • 10,148
  • 7
  • 57
  • 107
Lukas Liesis
  • 24,652
  • 10
  • 111
  • 109
-5

Use urlencode before json_encode

<?
$array['copyright_str'] = "Copyright site.com © 2011-2012";
$array['copyright_str'] = urlencode($array['copyright_str']);
echo json_encode($array);
?>
Ayush
  • 41,754
  • 51
  • 164
  • 239
  • 4
    Why? It is not a URL. That would *alter the data* and require the consumer to do the reverse. –  Mar 15 '12 at 17:48
  • But it will escape the copyright character and convert it to `©`. Reversal is trivial. – Ayush Mar 15 '12 at 17:50
  • That's not the issue or a solution. Imagine if it's *a different* Unicode character (say a ☃, which is a snowman). How would *that* be handled? If it's a one-off-hacky-edge case, clearly it is *not reliable* (unless there happens to be a *bug* with PHP that *only* affects the Unciode character for the copyright symbol). –  Mar 15 '12 at 17:51
  • I'de like to avoid storing URLENCODED data in the database as I can't directly edit via phpmyadmin if needbe. – user974896 Mar 15 '12 at 17:52
  • @xbonez — It won't store it as `©`, that's HTML encoding. The JSON format has its own way of storing the character, there should be no need to nest different data formats. – Quentin Mar 15 '12 at 22:20
  • 1
    only encode data in this format when you want to pass it as url, not for saving it to database – Saket Patel Mar 16 '12 at 12:22