Here is a PHP code snippet I came up with when I found a bug in my project.
print(($str == utf8_encode($str) ? "the same text" : "not the same text") . PHP_EOL);
print(mb_detect_encoding($str));
Now what this does, is tell me if a string $str
has the same encoding as its UTF-8 encoded version, after that it prints its initial encoding.
What I expected is that either the UTF-8 text is the same as the original, or that the original text is already UTF-8 and therefore the UTF-8 encoded text is the same as the original.
But what really happened is the following output:
not the same text
UTF-8
This is only the case if i set $str = array_keys($_POST)[0];
and i use a key with special characters in my request body like äöü=test
so that the $str
will be äöü
(defining it directly in the code will not result in the same output).
I interpret from the output that the original character encoding is UTF-8, but the two strings are not the same. If I print the initial string it is empty and the encoded string would be äöü.
I don't understand how a string can be different when encoded with its own encoding. Can someone please explain this to me?