0

I have a Greek text and I want to convert it to Hexadecimal code points without spaces. Just one big string of text.

This was exactly what I was looking for -> Unicode Hexadecimal code points for PHP but it doesn't provide the actual code how he did it.

Community
  • 1
  • 1
George D.
  • 1,630
  • 4
  • 23
  • 41

2 Answers2

5

Based on the original code and the answer to this question: How to get code point number for a given character in a utf-8 string? I put together this function:

function utf8_to_unicode($str) {

    $unicode = array();        
    $values = array();
    $lookingFor = 1;

    for ($i = 0; $i < strlen($str); $i++) {

        $thisValue = ord($str[$i]);

        if ($thisValue < 128) 
            $unicode[] = str_pad(dechex($thisValue), 4, "0", STR_PAD_LEFT);
        else {
            if (count($values) == 0) $lookingFor = ($thisValue < 224) ? 2 : 3;                
            $values[] = $thisValue;                
            if (count($values) == $lookingFor) {
                $number = ($lookingFor == 3) ?
                (($values[0] % 16) * 4096) + (($values[1] % 64) * 64) + ($values[2] % 64):
                (($values[0] % 32) * 64) + ($values[1] % 64);
                $number = strtoupper(dechex($number));
                $unicode[] = str_pad($number, 4, "0", STR_PAD_LEFT);
                $values = array();
                $lookingFor = 1;
            } // if
        } // if
    } // for
    return ($unicode);   
} // utf8_to_unicode

So:

$greekString = "ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ ";
$hexArray = utf8_to_unicode($greekString);
echo implode("", $hexArray);

Will output:

039103920393039403950396039703980399039A039B039C039D039E039F03A003A103A303A403A503A603A703A803A90032
Community
  • 1
  • 1
Kenny Linsky
  • 1,726
  • 3
  • 17
  • 41
  • Kenny there is something wrong with your code. most probably at if ($thisValue < 128) $unicode[] = str_pad($thisValue, 4, "0", STR_PAD_LEFT); – George D. Jan 03 '13 at 15:03
  • Γιώργο αν στείλεις αυτό ακριβώς this is a test The right convertion = 0393 03B9 03CE 03C1 03B3 03BF 0032 03B1 03BD 0032 03C3 03C4 03B5 03AF 03BB 03B5 03B9 03C2 0032 03B1 03C5 03C4 03CC 0032 03B1 03BA 03C1 03B9 03B2 03CE 03C2 0032 0116 0104 0105 0115 0032 0105 0115 0032 0097 0032 0116 0101 0115 0116 Your script = 0393 03B9 03CE 03C1 03B3 03BF 0020 03B1 03BD 0020 03C3 03C4 03B5 03AF 03BB 03B5 03B9 03C2 0020 03B1 03C5 03C4 03CC 0020 03B1 03BA 03C1 03B9 03B2 03CE 03C2 0020 0074 0068 0069 0073 0020 0069 0073 0020 0061 0020 0074 0065 0073 0074 – George D. Jan 03 '13 at 15:07
  • The problem is with any other than Greek character like A-Z, space e.t.c – George D. Jan 03 '13 at 16:40
  • 1
    You're right, I think the line should be `$unicode[] = str_pad(dechex($thisValue), 4, "0", STR_PAD_LEFT);` – Kenny Linsky Jan 04 '13 at 01:08
0

This is working for me:

header('Content-Type: text/html; charset=utf-8'); 

bin2hex(iconv('UTF-8', 'UTF-16BE', 'your message')); 
Paresh Mayani
  • 127,700
  • 71
  • 241
  • 295
Isa
  • 1