This PHP function return the decimal number of the first character in string.
- If the number is lower than 128 then the character is encoded in 1 octet.
- Elseif the number is lower than 2048 then the character is encoded in 2 octets.
- Elseif the number is lower than 65536 then the character is encoded in 3 octets.
- Elseif the number is lower than 1114112 then the character is encoded in 4 octets.
function ord_utf8($s){
return (int) ($s=unpack('C*',$s[0].$s[1].$s[2].$s[3]))&&$s[1]<(1<<7)?$s[1]:
($s[1]>239&&$s[2]>127&&$s[3]>127&&$s[4]>127?(7&$s[1])<<18|(63&$s[2])<<12|(63&$s[3])<<6|63&$s[4]:
($s[1]>223&&$s[2]>127&&$s[3]>127?(15&$s[1])<<12|(63&$s[2])<<6|63&$s[3]:
($s[1]>193&&$s[2]>127?(31&$s[1])<<6|63&$s[2]:0)));
}
echo ord_utf8('€');
// Output 8364 then this character is encoded in 3 octets
You can check the result in https://eval.in/748181 …
The ord_utf8 function is the reciprocal of chr_utf8 (print one utf8 character from decimal number)
function chr_utf8($n,$f='C*'){
return $n<(1<<7)?chr($n):($n<1<<11?pack($f,192|$n>>6,1<<7|191&$n):
($n<(1<<16)?pack($f,224|$n>>12,1<<7|63&$n>>6,1<<7|63&$n):
($n<(1<<20|1<<16)?pack($f,240|$n>>18,1<<7|63&$n>>12,1<<7|63&$n>>6,1<<7|63&$n):'')));
}
for($test=1;$test<1114111;$test++)
if (ord_utf8(chr_utf8($test))!==$test)
die('Error found');
echo 'No error';
// Output No error