I need to convert characters, like U+0123 (the Latin Small Letter G With Cedilla) to the appropriate UTF8 hex-encoded bytes like this 0xC4 0xA3 (or c4a3). I know there's a function (or combination of functions) I can use to accomplish this in PHP, but I can't seem to get it right.
Asked
Active
Viewed 2,947 times
0
-
I guess it might be easier to help you with that function if you at least mentioned its name... – Álvaro González Dec 30 '10 at 08:19
-
I was trying to mess with the code provided here http://stackoverflow.com/questions/2670039/php-utf-16-to-utf-8hex-conversion: mb_convert_encoding('' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');, but despite using bin2hex on the result I didn't get the bytes I'm aiming for – MarathonStudios Dec 30 '10 at 08:26
1 Answers
1
The function in the answer you are linking to works fine as is but you must take some things into account:
- The function expects a number (e.g.
0x0123
) not a string ('U+0123'
) - Your output must be displayed as UTF-8
- You may need to call
mb_internal_encoding('UTF-8')
(I've found that some systems have a wrong default)
Whatever, I've written a little variation that accepts a Unicode code point in case that's exactly what you need:
<?php
header('Content-Type: text/plain; charset=utf-8');
mb_internal_encoding('UTF-8');
function unicode_code_point_to_char($code_point) {
if( preg_match('/^U\+(\d{4,6})$/', $code_point, $matches) ){
return mb_convert_encoding('&#' . hexdec($matches[0]) . ';', 'UTF-8', 'HTML-ENTITIES');
}else{
return NULL;
}
}
echo unicode_code_point_to_char('U+0123');
Update:
I've just noticed that I have misread your question. Try this:
function unicode_code_point_to_hex_string($code_point) {
if( preg_match('/^U\+(\d{4,6})$/', $code_point, $matches) ){
return bin2hex(mb_convert_encoding('&#' . hexdec($matches[0]) . ';', 'UTF-8', 'HTML-ENTITIES'));
}else{
return NULL;
}
}

Community
- 1
- 1

Álvaro González
- 142,137
- 41
- 261
- 360