8

Here is php code:

$arr=array(228,184,173,230,150,135,99,104,105,110,101,115,101);
$str='';
foreach ($arr as $i){
    $str.=chr($i);
}
print $str;

the output is: 中文chinese

Here is javascript code:

var arr=[228,184,173,230,150,135,99,104,105,110,101,115,101];
var str='';
for (i in arr){
    str+=String.fromCharCode(arr[i]);
}
console.log(str);

the output is: 中æchinese

So how should I process the array at javascript?

solomon_wzs
  • 1,711
  • 5
  • 16
  • 29
  • 1
    When I run the PHP code, I get the output `中文chinese`. Is there anything special about your PHP configuration? – Stegrex Dec 25 '12 at 06:10
  • I get the same exact output as @Stegrex – PhearOfRayne Dec 25 '12 at 06:12
  • @Stegrex Maybe it is the problem of locale setting. you could try to cancel the comment `zh_CN.XXX` at `/etc/locale.gen` – solomon_wzs Dec 25 '12 at 06:23
  • I am not sure how it works out in your PHP code. But for javascript the correct array is [20013,25991,99,104,105,110,101,115,101] – Ravi Y Dec 25 '12 at 06:25
  • @Stegrex: you are viewing it in ASCII. Interpret it as UTF-8. – DCoder Dec 25 '12 at 06:27
  • @Stegrex, @ 0DEFACED, add this line before the print: `header('Content-type:text/html; charset=UTF-8')`. Open it with a proper browser. – Pacerier Jul 27 '15 at 08:48

5 Answers5

20

JavaScript strings consist of UTF-16 code units, yet the numbers in your array are the bytes of a UTF-8 string. Here is one way to convert the string, which uses the decodeURIComponent() function:

var i, str = '';

for (i = 0; i < arr.length; i++) {
    str += '%' + ('0' + arr[i].toString(16)).slice(-2);
}
str = decodeURIComponent(str);

Performing the UTF-8 to UTF-16 conversion in the conventional way is likely to be more efficient but would require more code.

PleaseStand
  • 31,641
  • 6
  • 68
  • 95
6
var arry = [3,5,7,9];
console.log(arry.map(String))

the result will be ['3','5','7','9']

var arry = ['3','5','7','9']
console.log(arry.map(Number))

the result will be [3,5,7,9]

Opal
  • 81,889
  • 28
  • 189
  • 210
ingrid
  • 93
  • 1
  • 2
3

Another solution without decodeURIComponent for characters up to 3 bytes (U+FFFF). The function presumes the string is valid UTF-8, not much error checking...

function atos(arr) {
    for (var i=0, l=arr.length, s='', c; c = arr[i++];)
        s += String.fromCharCode(
            c > 0xdf && c < 0xf0 && i < l-1
                ? (c & 0xf) << 12 | (arr[i++] & 0x3f) << 6 | arr[i++] & 0x3f
            : c > 0x7f && i < l
                ? (c & 0x1f) << 6 | arr[i++] & 0x3f
            : c
        );

    return s
}
smrtl
  • 604
  • 5
  • 7
  • I tested this with Chinese, Russian, Hebrew and English, and it works. The code is not very readable, but it's the right approach. – Rob H Aug 03 '17 at 19:42
3

Seems the best way these days is the following:

function bufferToString(arr){
    return arr.map(function(i){return String.fromCharCode(i)}).join("")
}
Community
  • 1
  • 1
Sancarn
  • 2,575
  • 20
  • 45
1

Chinese charset has a different encoding in which one char is more than one byte long. When you do this

for (i in arr){
    str+=String.fromCharCode(arr[i]);
}

You are converting each byte to a char(actually string) and adding it to a string str. What you need to do is, pack the bytes together.

I changed your array to this and it worked for me:

var arr=[20013,25991,99,104,105,110,101,115,101];

I got these codes from here.

you can also take a look at this for packing bytes to a string.

Community
  • 1
  • 1
Amit Khanna
  • 489
  • 4
  • 16