Hex to Unicode in PHP ( \u014D to ō)

Question

Possible Duplicate:
How to decode Unicode escape sequences like “\u00ed” to proper UTF-8 encoded characters?

How can I convert \u014D to ō in PHP?

Thank You

This post is in Python but maybe help :) http://stackoverflow.com/questions/867866/convert-unicode-codepoint-to-utf8-hex-in-python — SubniC, Nov 26 '10 at 10:33

score 3 · Accepted Answer · answered Nov 26 '10 at 12:45

3

It's not immediate clear what you mean when you say "to ō". If you're asking how to convert it into a different encoding then a general approach is to use the iconv function. 014D is the UCS-2 (unicode) for your desired function so, if you have a string containing the bytes 014D you could use

iconv('UCS-2', 'UTF-8', $s)

to convert from UCS-2 to UTF-8. Similarly if you want to convert to a different encoding - although you need to be aware that not all encodings will include the character you are using. You'll see from the iconv documentation that the //TRANSLIT option may help in that case.

Note that iconv is taking a byte sequence so, if you actually have a string containing a slash, then a u, then a 0 etc... you'll need to convert that into the byte sequence first.

answered Nov 26 '10 at 12:45

borrible

17,120
7
53
75

@borrible - Does not seems correct, you want to re-test ? – ajreal Nov 26 '10 at 12:48
@ajreal - make sure you're testing on a UCS-2 byte sequence. – borrible Nov 26 '10 at 12:59
@borrible - i try the code you posted, does not seems to work if the `$s = '\u014D';`, it converted to invalid Chinese characters – ajreal Nov 26 '10 at 13:08
@ajreal - The code you give is not putting a UCS-2 byte sequence into $s, it's putting a \ followed by a u followed by a 0 etc... You can see this if you `echo bin2hex($s)`. As mentioned in my answer the iconv is designed for a string containing the byte 01 followed by 4D. – borrible Nov 26 '10 at 13:34
@borrible - Mind to update your answer to ensure the $s into UCS-2 byte sequence? – ajreal Nov 26 '10 at 13:43
@borrible, @ajreal: UCS-2 is not somehow equivalent to Unicode as you allege. UCS-2 can no more hold all of Unicode than can ASCII hold all of Greek or Latin. Anyway, who still uses UCS-2? Talk about putting the backwards in backwards-compatible! UCS-2 is such an antemillennial legacy that it’s embarrassing to see people talk about. Unicode is a logical set of numbers, whereas UTF-8, UTF-16, and UTF-32 are physical transforms that preserve all code points — unlike the *ipso facto* broken UCS-2! – tchrist Nov 26 '10 at 21:48

score 1 · Answer 2 · answered Nov 26 '10 at 12:57

If you have the escape characters in the string you could use a messy exec statement.

$string = '\\u014D';
exec("\$string = '$string'");

This way, the Unicode escape sequence should be recognized and interpreted as a unicode character When the string is parsed.

Of course, you should never use exec unless absolutely necessary.

Hex to Unicode in PHP ( \u014D to ō)

2 Answers2