2

EDIT 2: I'd like to convert English words to unicode numbers using php5 and then produced as \u* * * * where * * * * is the unicode number.

In my original question, I had mistakenly thought that \u was a standard for encoding unicode when in fact it is just being escaped in JavaScript ( Thankyou Jukka K. Korpela for pointing this out). Even though I wanted to do the conversion in PHP the converted unicode was to be used in JavaScript.

I tried the below options, but had no luck. deceze's answer did the trick though, thank you very much!

THINGS I TRIED

I've read that I can use iconv to do this, but I've had no luck and can't find any examples on how.

I've also tried Scott Reynen's code here How to get code point number for a given character in a utf-8 string? but I can't seem to get it to work. When I tried it I included the script in a file along with

$str='test';
echo utf8_to_unicode($str);

It just echoed out test.

I've also read that I can use

echo json_encode("test");

but again I only get test printed to the screen.

Any help would be much appreciated.

EDIT1: Actually I think they are called code units not code points.

Community
  • 1
  • 1
TryHarder
  • 2,704
  • 8
  • 47
  • 65
  • You may want to look at http://stackoverflow.com/questions/395832/how-to-get-code-point-number-for-a-given-character-in-a-utf-8-string – Daan Apr 11 '12 at 06:17
  • Thanks. I've looked at that, but I will look over it again. – TryHarder Apr 11 '12 at 06:23
  • 0054 is a Unicode number, also called Unicode code point, and conventionally written with the “U+” when used in text. Prefixing it with “\u” creates something that is not used in normal language and that acts as an escape notation in JavaScript literals. It is not clear at all what you mean here and why you would be doing it. When you already have a character, why would you generate a JavaScript escape for it, and where would you use it? – Jukka K. Korpela Apr 11 '12 at 07:00
  • Thanks for your reply. Originally I was storing the code points in mysql, but instead I decided to store them as normal text. The text would be converted to unicode code points in php added into an array and then eventually used in a javascript code. I don't want to do the conversion in javascript. Part of the reason I'm using unicode is to make it harder for prying eyes to read. – TryHarder Apr 11 '12 at 07:11
  • Clarification: I'm trying to make it harder to read the answers from the javascript code. – TryHarder Apr 18 '12 at 07:11

1 Answers1

12

json_encode pretty much does it for you, but only for non-ASCII characters. So all you need to do is to convert ASCII characters by hand. Here's a function that does that on a character-by-character basis:

function utf8ToUnicodeCodePoints($str) {
    if (!mb_check_encoding($str, 'UTF-8')) {
        trigger_error('$str is not encoded in UTF-8, I cannot work like this');
        return false;
    }
    return preg_replace_callback('/./u', function ($m) {
        $ord = ord($m[0]);
        if ($ord <= 127) {
            return sprintf('\u%04x', $ord);
        } else {
            return trim(json_encode($m[0]), '"');
        }
    }, $str);
}
deceze
  • 510,633
  • 85
  • 743
  • 889
  • tip: I also made a small change to only convert non english, non printable chars, simply change: return sprintf('\u%04x', $ord); to return $m[0]; – Sagi Mann Aug 23 '12 at 09:36
  • Thanks @deceze you helped me decode an AXt arabic font! http://forums.adobe.com/message/5721002#5721002 – numediaweb Sep 28 '13 at 11:14