0

I need to convert characters, like U+0123 (the Latin Small Letter G With Cedilla) to the appropriate UTF8 hex-encoded bytes like this 0xC4 0xA3 (or c4a3). I know there's a function (or combination of functions) I can use to accomplish this in PHP, but I can't seem to get it right.

MarathonStudios
  • 283
  • 1
  • 9
  • 17
  • I guess it might be easier to help you with that function if you at least mentioned its name... – Álvaro González Dec 30 '10 at 08:19
  • I was trying to mess with the code provided here http://stackoverflow.com/questions/2670039/php-utf-16-to-utf-8hex-conversion: mb_convert_encoding('' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');, but despite using bin2hex on the result I didn't get the bytes I'm aiming for – MarathonStudios Dec 30 '10 at 08:26

1 Answers1

1

The function in the answer you are linking to works fine as is but you must take some things into account:

  • The function expects a number (e.g. 0x0123) not a string ('U+0123')
  • Your output must be displayed as UTF-8
  • You may need to call mb_internal_encoding('UTF-8') (I've found that some systems have a wrong default)

Whatever, I've written a little variation that accepts a Unicode code point in case that's exactly what you need:

<?php

header('Content-Type: text/plain; charset=utf-8');

mb_internal_encoding('UTF-8');

function unicode_code_point_to_char($code_point) {
    if( preg_match('/^U\+(\d{4,6})$/', $code_point, $matches) ){
        return mb_convert_encoding('&#' . hexdec($matches[0]) . ';', 'UTF-8', 'HTML-ENTITIES');
    }else{
        return NULL;
    }
}

echo unicode_code_point_to_char('U+0123');

Update:

I've just noticed that I have misread your question. Try this:

function unicode_code_point_to_hex_string($code_point) {
    if( preg_match('/^U\+(\d{4,6})$/', $code_point, $matches) ){
        return bin2hex(mb_convert_encoding('&#' . hexdec($matches[0]) . ';', 'UTF-8', 'HTML-ENTITIES'));
    }else{
        return NULL;
    }
}
Community
  • 1
  • 1
Álvaro González
  • 142,137
  • 41
  • 261
  • 360