5

I’ve got a simple PHP one-dimension array.

When I do a var dump (echo var_dump($a)), I get this as the output:

array(3) { [0]=>  string(3) "尽" [1]=>  string(21) "exhausted||to exhaust" [2]=>  string(4) "jin3" }

However, when I json_encode it (echo json_encode($a)) I get this:

["\u5c3d","exhausted||to exhaust","jin3"]

The hex value that it’s returning is the correct one, but I can’t figure out how to stop it from giving me the hex. I just want it to display the character.

If I echo mb_internal_encoding() it returns UTF-8, which is what I’ve set it to. I’ve been very careful in all my string manipulation to use the mb_ functions so none of the data gets messed up.

I know that I could write a modified json_encode function which would take care of the problem. But I want to know what’s going on here.

Tyler Wall
  • 3,747
  • 7
  • 37
  • 52
Trevor
  • 51
  • 1
  • 1
  • 2
  • 1
    The JSON it is generating is equivalent to the JSON with the character written explicitly. Now, writing the character explicitly would be easier to read, and take less bytes, but the two JSON strings are nonetheless equivalent. – Thanatos Nov 29 '09 at 03:52
  • possible duplicate of [Json_encode Charset problem](http://stackoverflow.com/questions/3035462/json-encode-charset-problem) – Ignacio Vazquez-Abrams Nov 25 '10 at 06:54

6 Answers6

9

I know this question is older but thought I'd lend my working–in–China to_json and to_utf8 functions — which includes some nice formatting (JSON_PRETTY_PRINT) when in development vs minified production. (Adapt to your own env/system)


Simple

// Produces JSON with Chinese Characters fully un-encoded.
// NOT RFC4627 compliant
json_encode($data, JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES); 

to_json()

function to_json($data, $pretty=null, $inculde_security=false, $try_to_recover=true) {
  // @Note: json_encode() *REQUIRES* data to be in valid UTF8 format BEFORE
  //                    trying to json_encode   and since we are working with Chinese
  //                    characters, we need to make sure that we explicitly allow:
  //                    JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES
  //                    *Unless a mode is explicitly passed into the function
    $json_encoded = '{}';
    if ($pretty === null && is_env_prod()) { // @NOTE: Substitute with your own Production env check
        $json_encoded = json_encode( $data, JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES );
    } else if ($pretty === null && is_env_dev()){ // @NOTE: Substitute with your own Development env check
        $json_encoded = json_encode( $data, JSON_PRETTY_PRINT|JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES );
    } else {
        // PRODUCTION
        $json_encoded = json_encode( $data, $pretty );
    }



    // (1) Do not return an error if the inital data was empty
    // (2) Return an error if json_encode() failed
    if (json_last_error() > 0) {
        if (!!$data || !empty($data)) {
            if (!$json_encoded == false || empty($json_encoded) || $json_encoded == '{}') {
                $json_encoded = json_encode([
                    'status' => false,
                    'error' => [
                        'json_last_error' => json_last_error(),
                        'json_last_error_msg' => json_last_error_msg()
                    ]
                ]);
            } else if (!!$try_to_recover) {
                // there was data in $data so lets try to forensically recover a little? by removing $k => $v pairs that fail to be JSON encoded
                foreach (((array) $data) as $k => $v) {
                    if (!json_encode([$k => $v])) {
                        if (is_array($data)) {
                            unset($data[$k]);
                        } else if (is_object($data)) {
                            unset($data->{$k});
                        }
                    }
                }

                // if the data still is not empty, and there is a status set in the data
                //      then set it to false and add a error message/data
                //      ONLY for Array & Objects
                if (!empty($json_encoded) && count($json_encoded) < 1) {
                    if (!json_encode($data)) {
                        if (is_array($json_encoded)) {
                            $json_encoded['status'] = false;
                            $json_encoded['message'] = "json_encoding_error";
                            $json_encoded['error'] = [
                                'json_last_error' => json_last_error(),
                                'json_last_error_msg' => json_last_error_msg()
                            ];
                        } else if (is_object($json_encoded)) {
                            $json_encoded->status = false;
                            $json_encoded->message = "json_encoding_error";
                            $json_encoded->error = [
                                'json_last_error' => json_last_error(),
                                'json_last_error_msg' => json_last_error_msg()
                            ];
                        }
                    } else {
                      // We have removed the offending data
                      return to_json($data, $pretty, $include_security, $try_to_recover);
                    }
                }

                // we've cleaned out any data that was causing the problem, and included
                //      false to indicate this is a one-time recursion recovery.
                return $this->to_json($pretty, $include_security, false);
            }
        } else { } // don't do anything as the value is already false
    }

  return ( ($inculde_security) ? ")]}',\n" : '' ) . $json_encoded;
}

Another funciton that might be usful is my recursive to_utf8() functionality:

to_utf8()

// @NOTE: Common Chinese GBK encoding: to_utf8($data, 'GB2312')
function to_utf8($in, $source_encoding='HTML-ENTITIES') {
  if (is_string($in)) {
    return mb_convert_encoding(
      $in,
      $source_encoding,
      'UTF-8'
    );
  } else if (is_array($in) || is_object($in)) {

    array_walk_recursive($in, function(&$item, &$key) {
      $key = to_utf8($key);

      if (is_object($item) || is_array($item)) {
        $item = to_utf8($item);
      } else {
        if (!mb_detect_encoding($item, 'UTF-8', true)){
          $item = utf8_encode($item);
        }
      }
    });

    $ret_object = is_object($in);
    return ($ret_object) ? (object) $in : (array) $in;
  }

  return $in;
}

Validate RFC4627 (valid JSON)

$pcre_regex = '
  /
  (?(DEFINE)
     (?<number>   -? (?= [1-9]|0(?!\d) ) \d+ (\.\d+)? ([eE] [+-]? \d+)? )
     (?<boolean>   true | false | null )
     (?<string>    " ([^"\\\\]* | \\\\ ["\\\\bfnrt\/] | \\\\ u [0-9a-f]{4} )* " )
     (?<array>     \[  (?:  (?&json)  (?: , (?&json)  )*  )?  \s* \] )
     (?<pair>      \s* (?&string) \s* : (?&json)  )
     (?<object>    \{  (?:  (?&pair)  (?: , (?&pair)  )*  )?  \s* \} )
     (?<json>   \s* (?: (?&number) | (?&boolean) | (?&string) | (?&array) | (?&object) ) \s* )
  )
  \A (?&json) \Z
  /six
';

$matches = false;
preg_match($pcre_regex, trim($body), $matches);

var_dump('RFC4627 Verification (Regex) ', [
  'has_passed' => (count($matches) == 1) ? 'YES' : 'NO',
  'matches'    => $matches
]);

is_json()

// One Liner
is_string($json_string) && !preg_match('/[^,:{}\\[\\]0-9.\\-+Eaeflnr-u \\n\\r\\t]/', preg_replace('/"(\\.|[^"\\\\])*"/', '', $json_string));

// Alt Function — more consistant
function is_json($json_string) {
  if (!is_string($json_string) || is_numeric($json_string)) {
      return false;
  }

  $val = @json_decode($json_string);

  return ($val != null) && (json_last_error() === JSON_ERROR_NONE);

  // Inconsistant results, reverted to json_decode() + JSON_ERROR_NONE check
  // return is_string($json_string) && !preg_match('/[^,:{}\\[\\]0-9.\\-+Eaeflnr-u \\n\\r\\t]/', preg_replace('/"(\\.|[^"\\\\])*"/', '', $json_string));
}

is_utf8()

function is_utf8($str) {
  if (is_array($str)) {
    foreach ($str as $k=>$v) {
      if (is_string($v) && !is_utf8($v)) {
        return false;
      }
    }
  }

  return (is_string($str) && preg_match('//u', $str));
}
Community
  • 1
  • 1
Tyler Wall
  • 3,747
  • 7
  • 37
  • 52
5

The behaviour of json_encode() is perfectly correct, but unnecessary. In PHP 5.4, it can be disabled with the JSON_UNESCAPED_UNICODE flag.

TRiG
  • 10,148
  • 7
  • 57
  • 107
2

try this you are posting data in json and want to submit chines or Japaneses or Koran or any other language characters

 json_encode($data, JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES); 
Darkcoder
  • 802
  • 1
  • 9
  • 17
0

I've did the same investigation on json_encode(). My conclusion is that you cannot alter the behavior, yet it doesn't cause any problem so I will just leave with that.

If you really don't like it, do preg_replace_callback() on json_encode() output and convert the code point hex back into UTF-8 characters.

timdream
  • 5,914
  • 5
  • 21
  • 24
0

If you are using mysql to create the array you can use:

mysql_query("SET NAMES utf8");

This will create a result in utf8.

I have not tried but you may wish to look at utf8_encode and utf8_decode functions in php.

Andrew Atkinson
  • 4,103
  • 5
  • 44
  • 48
  • It has the data in UTF-8 (if it didn't, `json_encode()` would return `null`). The Unicode-escaping is just the way `json_encode()` works, but it can be turned off. See my answer. – TRiG May 30 '12 at 19:39
0

json_encode()

has some options use this way

json_encode($data,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
ganji
  • 752
  • 7
  • 17