0

I am having problem with converting Unicode characters to human readable text in php.

I have string of unicode characters like following

$chars = "\u1006\u1092\u1019\u1021\u102c\u101b\u1036\u102f \u1019\u1002\u1062\u1007\u1004\u1039\u1038 (\u1042\u1040\u1041\u1046 \u1007\u1030\u101c\u102d\u102f\u1004\u1039)";

If I echo like this

echo $chars

It will not convert to human readable string. But If I echo like this

$text = '<script type="text/javascript">
document.write("\u1006\u1092\u1019\u1021\u102c\u101b\u1036\u102f \u1019\u1002\u1062\u1007\u1004\u1039\u1038 (\u1042\u1040\u1041\u1046 \u1007\u1030\u101c\u102d\u102f\u1004\u1039)");
</script>';

echo $text;

It can print the human readable string like below.

enter image description here

Using that way I can show the result to user. But the problem is I want to store as human readable string in database. So I can do other operation with that string. So my questions are

  1. How can I convert that Unicode characters into human readable string in PHP?

OR

  1. How can I assign the result of the JavaScript as in second method into a string in php?

Here is the same question I asked long ago, Converting Unicode character to text in PHP is not working.

halfer
  • 19,824
  • 17
  • 99
  • 186
Wai Yan Hein
  • 13,651
  • 35
  • 180
  • 372

3 Answers3

1

You can use a /\\\\u([0-9a-f]{4})/iu regex to match \uXXXX notation substrings capturing the digits into Group 1 that will be later used inside a preg_replace_callback anonymous function to pack the data into a binary string. Since we pass a hexadecimal value to the pack function, the first argument - format character - should be H:

H   Hex string, high nibble first

See a PHP demo:

$chars = "\u1006\u1092\u1019\u1021\u102c\u101b\u1036\u102f \u1019\u1002\u1062\u1007\u1004\u1039\u1038 (\u1042\u1040\u1041\u1046 \u1007\u1030\u101c\u102d\u102f\u1004\u1039)";
$encoding = ini_get('mbstring.internal_encoding');
$str = preg_replace_callback('/\\\\u([0-9a-f]{4})/iu', function($match) use ($encoding) {
        return mb_convert_encoding(pack('H*', $match[1]), $encoding, 'UTF-16BE');
    }, $chars);
echo $str;
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

You can use intl/Transliterator class:

$out = transliterator_create('Hex-Any')->transliterate($chars);
var_dump($out);

The builtin converter Hex-Any handles unescaping of both \uXXXX and \UXXXXXXXX sequences.

I don't know if it's relevant in your case, but, since PHP 7.0.0, you could write $chars this way:

$chars = "\u{1006}\u{1092}\u{1019}\u{1021}\u{102c}\u{101b}\u{1036}\u{102f} ...";
julp
  • 3,860
  • 1
  • 22
  • 21
0

PHP 7+

As of PHP 7, you can use the Unicode codepoint escape syntax to do this.

echo "\u{1006}\u{1092}\u{1019}\u{1021}\u{102c}\u{101b}\u{1036}\u{102f} \u{1019}\u{1002}\u{1062}\u{1007}\u{1004}\u{1039}\u{1038}";

outputs

ဆ႒မအာရံု မဂၢဇင္း.

Does that answer your question ?

Rabin Lama Dong
  • 2,422
  • 1
  • 27
  • 33