The PHP library lacks a mb_ord()
function... That is, something that do what ord() function do, but for UTF8 (or "mb" multibyte, so "mb_ord"). I used some clues from here,
$ord = hexdec( bin2hex($utf8char) ); //decimal
and I suppose that mb_substr($text, $i, 1, 'UTF-8')
gets "1 utf8-char"... But $ord not returns the values that we expect.
CONTEXT
This code not works: not shows code like 177 (plusmn).
$msg = '';
$text = "... a UTF-8 long text... Ą ⨌ 2.5±0.1; 0.5±0.2 ...";
$allOrds = array();
for($i=0; $i<mb_strlen($text, 'UTF-8'); $i++) {
$utf8char = mb_substr($text, $i, 1, 'UTF-8'); // 1=1 unicode character?
$ord = hexdec( bin2hex($utf8char) ); //decimal
if ($ord>126) { //non-ASCII
if (isset($allOrds[$ord])) $allOrds[$ord]++; else $allOrds[$ord]=1;
}
}
foreach($allOrds as $o=>$n)
$msg.="\n entity #$o occurs $n times";
echo $msg;
OUTPUT
entity #50308 occurs 1 times
entity #14854284 occurs 1 times
entity #49841 occurs 2 times
So (see entities table), 49841 is not 177, and 14854284 (iiiint) is not 10764.