2

I get the HEX codepoint from a UTF-8 string via json_encode as

substr(json_encode($str), 1, -1);

However, json_encode does not convert the characters at the ASCII range. For example,

For

sÆs

I get

s\u00C6s

but I want to get

\u0073\u00C6\u0073
Googlebot
  • 15,159
  • 44
  • 133
  • 229

1 Answers1

1

I take json_encode for multibyte characters and assemble it for the ASCII characters.

function utf8toUnicode($str){
  $unicode = "";
  $len = mb_strlen($str);
  for($i=0;$i<$len;$i++){
    $utf8char = mb_substr($str,$i,1);
    $unicode .= strlen($utf8char)>1
      ?trim(json_encode($utf8char),'"')
      :('\\u00'.bin2hex($utf8char))
    ;
  }
  return $unicode;
} 

$str = 'sÆs'; 

echo utf8toUnicode($str);  // \u0073\u00c6\u0073
jspit
  • 7,276
  • 1
  • 9
  • 17