I have a simple code in JS that I can't replicate in PHP if it comes to special characters.
This is the JS code (see JSFiddle for output):
var str = "t↙️"; //char "t" and special characters, emojis, etc..
document.write("Length is: "+str.length); // Length is: 19
for(var i=0; i<str.length; i++) {
document.write("<br> charCodeAt(" + i + "): " + str.charCodeAt(i));
}
The first problem is that PHP strlen()
and mb_strlen()
already gives different results from JS (strlen: 39, mb_strlen: 11), however I managed to get the same with a custom JS_StringLength
function (thanks to this SO answer).
Here is what I have in PHP so far (see phpFiddle for output):
<?php
function JS_StringLength($string) {
return strlen(iconv('UTF-8', 'UTF-16LE', $string)) / 2;
}
function JS_charCodeAt($str, $index){
//not working!
$char = mb_substr($str, $index, 1, 'UTF-8');
if (mb_check_encoding($char, 'UTF-8'))
{
$ret = mb_convert_encoding($char, 'UTF-32BE', 'UTF-8');
return hexdec(bin2hex($ret));
} else {
return null;
}
}
$str = "t↙️";
echo $str."\n";
//echo "Length is: ".strlen($str)."\n"; //wrong
echo "Length is: ".JS_StringLength($str)."\n"; //OK
for($i=0; $i<JS_StringLength($str); $i++) {
echo "charCodeAt(".$i."): ".JS_charCodeAt($str, $i)."\n";
}
After a full day of Googling, and trying out everything I found, nothing gave the same results as JS.
What should JS_charCodeAt
be to get the same output as JS with similar performance?
Experimenting #1:
Enter my string into https://r12a.github.io/app-conversion/ (awesome stuff). Looks like JS works with UTF-16 code units (19) and PHP strlen
counts UTF-8 code units (39).
Experimenting #2:
When using json_encode()
on my string - of course - the result will almost be something like that, what JavaScript may uses. I even examined the original PHP source code of json_encode and how json_encode escapes strings, but.. well..
Before flagging as a duplicate, please make sure you test a solution with the string in the above examples (or random emojis) as ALL the charCodeAt implementations found here on stackoverflow are working with most of the special characters, but NOT with emojis.