I want to get the length of UTF-8 strings in PHP code but i havn't access to cPanel host for enable multibyte String functions in PHP. is there any other way?
Meanwhile, I can not use strlen() function, because i get wrong length in UTF-8 strings.
I want to get the length of UTF-8 strings in PHP code but i havn't access to cPanel host for enable multibyte String functions in PHP. is there any other way?
Meanwhile, I can not use strlen() function, because i get wrong length in UTF-8 strings.
Well, then you have to write it yourself.
In short, UTF-8 is encoded as follows:
10
.For example, suppose we have the following string:
Hëllo현World
01001000 ═ H --> Starts with 0, so it's a single-byte character
11000011 ╦ ë --> Starts with two 1s followed by 0. Char takes up 2 bytes.
║ This byte is the first one of the 2 bytes. The remaining 1
║ byte MUST start with 10.
10101011 ╝ --> This is a 'continuation' byte, and MUST start with 10.
Well, it does, so it's valid.
01101100 ═ l --> This byte start with 0, so it's a normal byte, again.
01101100 ═ l
01101111 ═ o
11101101 ╗ --> Starts with three 1-bits. So the character takes up 3 bytes.
║ The next 3-1=2 bytes must start with 10
10011000 ╬ 현 --> Continuation byte
10000100 ╝ --> Continuation byte
01010111 ═ W --> Normal byte
01101111 ═ o
01110010 ═ r
01101100 ═ l
01100100 ═ d
It is sufficient to just count all bytes not starting with bits 10
. With other words, if the byte is not in the range 128-191 inclusive.
$str = "Hëllo현World";
// ë takes up 2 bytes
// 현 takes up 3 bytes
// In a decent browser you see 11 characters (ten Latin, one Chinese)
$len = 0;
for ($i = 0; $i < strlen($str); $i++) {
$ascii = ord($str[$i]);
if ($ascii < 128 || $ascii >= 192) {
$len++;
}
}
echo "Number of bytes: ".strlen($str)."\n";
echo "Number of characters: ".$len;
PS: Is there a reason you don't want to enable multibyte strings?