4

Let's say I have the word "Russian" written in Cyrillic. This is would be the quivalent of the following in Hex:

Русский

My question is: how do I write a function which will go from "Russian" in Cyrillic to it's hex value as above? Could this same function work also for singel byte characters?

knittl
  • 246,190
  • 53
  • 318
  • 364
Adrien Hingert
  • 1,416
  • 5
  • 26
  • 51

2 Answers2

5

The 〹 thingies are called HTML Entities. In PHP there is a function that can create these: mb_encode_numericentityDocs, it's part of the Multibyte String extension (Demo):

$cyrillic = 'русский';

$encoding = 'UTF-8';
$convmap = array(0, 0xffff, 0, 0xffff);
$encoded = mb_encode_numericentity($cyrillic, $convmap, $encoding);

echo $encoded; # русский

However: You need to know the encoding of your Cyrillic string. In this case I've chosen UTF-8, depending on it you need to modify the $encoding parameter of the function and the $convmap array.

hakre
  • 193,403
  • 52
  • 435
  • 836
2

Your provided example isn't hex, but if you want to convert to hex, try this:

function strToHex($string)
{
    $hex='';
    for ($i=0; $i < strlen($string); $i++)
    {
        $hex .= dechex(ord($string[$i]));
    }
    return $hex;
}

function hexToStr($hex)
{
    $string='';
    for ($i=0; $i < strlen($hex)-1; $i+=2)
    {
        $string .= chr(hexdec($hex[$i].$hex[$i+1]));
    }
    return $string;
}

echo strToHex('русский'); // d180d183d181d181d0bad0b8d0b9
AlienWebguy
  • 76,997
  • 17
  • 122
  • 145
  • One note - control characters (such as \n) cause issue, so you'll need to 0-pad them by changing the " dechex(ord($string[$i])) " bit to " str_pad(dechex(ord($string[$i])), 2, "0", STR_PAD_LEFT) " in the strToHex() call. But a fantastic answer overall - thanks _very_ much for this :-) – Dave Carpeneto Jul 06 '16 at 13:25