2

Possible Duplicate:
How to decode Unicode escape sequences like “\u00ed” to proper UTF-8 encoded characters?

How can I convert \u014D to ō in PHP?

Thank You

Community
  • 1
  • 1
bbnn
  • 3,505
  • 10
  • 50
  • 68

2 Answers2

3

It's not immediate clear what you mean when you say "to ō". If you're asking how to convert it into a different encoding then a general approach is to use the iconv function. 014D is the UCS-2 (unicode) for your desired function so, if you have a string containing the bytes 014D you could use

iconv('UCS-2', 'UTF-8', $s)

to convert from UCS-2 to UTF-8. Similarly if you want to convert to a different encoding - although you need to be aware that not all encodings will include the character you are using. You'll see from the iconv documentation that the //TRANSLIT option may help in that case.

Note that iconv is taking a byte sequence so, if you actually have a string containing a slash, then a u, then a 0 etc... you'll need to convert that into the byte sequence first.

borrible
  • 17,120
  • 7
  • 53
  • 75
  • @borrible - Does not seems correct, you want to re-test ? – ajreal Nov 26 '10 at 12:48
  • @ajreal - make sure you're testing on a UCS-2 byte sequence. – borrible Nov 26 '10 at 12:59
  • @borrible - i try the code you posted, does not seems to work if the `$s = '\u014D';`, it converted to invalid Chinese characters – ajreal Nov 26 '10 at 13:08
  • @ajreal - The code you give is not putting a UCS-2 byte sequence into $s, it's putting a \ followed by a u followed by a 0 etc... You can see this if you `echo bin2hex($s)`. As mentioned in my answer the iconv is designed for a string containing the byte 01 followed by 4D. – borrible Nov 26 '10 at 13:34
  • @borrible - Mind to update your answer to ensure the $s into UCS-2 byte sequence? – ajreal Nov 26 '10 at 13:43
  • @borrible, @ajreal: UCS-2 is not somehow equivalent to Unicode as you allege. UCS-2 can no more hold all of Unicode than can ASCII hold all of Greek or Latin. Anyway, who still uses UCS-2? Talk about putting the backwards in backwards-compatible! UCS-2 is such an antemillennial legacy that it’s embarrassing to see people talk about. Unicode is a logical set of numbers, whereas UTF-8, UTF-16, and UTF-32 are physical transforms that preserve all code points — unlike the *ipso facto* broken UCS-2! – tchrist Nov 26 '10 at 21:48
1

If you have the escape characters in the string you could use a messy exec statement.

$string = '\\u014D';
exec("\$string = '$string'");

This way, the Unicode escape sequence should be recognized and interpreted as a unicode character When the string is parsed.

Of course, you should never use exec unless absolutely necessary.

Youarefunny
  • 622
  • 5
  • 10