How can I convert an array of bytes into a UTF-8 string? I need this because I am extracting from a binary format.
-
possible duplicate of [How can I convert array of bytes to a string in PHP?](http://stackoverflow.com/questions/5473011/how-can-i-convert-array-of-bytes-to-a-string-in-php) – mario Sep 02 '12 at 19:01
-
have tried utf8_decode(pack("C*", array_slice($data, $i, $j))), but that's getting me nowhere ;) im a little bit newbie at this php. – HelloWorld Sep 02 '12 at 19:08
-
5not duplicate. utf-8 is quite different than from ascii. i presume the method to decode as well. – HelloWorld Sep 02 '12 at 19:08
-
Yes, it *might* be different. But you need to show an actual example of how the encoding got mixed up by simple byte-as-char serializing. If you had UTF-8BE (not actually legal, but in absence of a better description *from you*) then it would require a pretty cumbersome workaround. (Don't bother asking, too broad for SO as few people answer a shallow one-liner question like yours.) – mario Sep 02 '12 at 22:11
1 Answers
A string is nothing more than an array of bytes. So a UTF-8 string is the very same as an array of bytes, except that in addition you know what the array of bytes represent.
So your input array of bytes needs one more additional information as well: the character set (character encoding). If you know the input character set, you can convert the array of bytes to another array of bytes representing an UTF-8 string.
The PHP method for doing that is called mb_convert_encoding()
.
PHP itself does not know of character sets (character encodings). So a string really is nothing more than an array of bytes. The application has to know how to handle that.
So if you have an array of bytes and want to turn that into a PHP string in order to convert the character set using mb_convert_encoding()
, try the following:
$input = array(0x53, 0x68, 0x69);
$output = '';
for ($i = 0, $j = count($input); $i < $j; ++$i) {
$output .= chr($input[$i]);
}
$output_utf8 = mb_convert_encoding($output, 'utf-8', 'enter input encoding here');
(Instead of the single example above, have a look at more examples at https://stackoverflow.com/a/5473057/530502.)
$output_utf8
then will be a PHP string of the input array of bytes converted to UTF-8.