let uint8Array = new Uint8Array([228, 189, 160, 229, 165, 189]);
alert( new TextDecoder().decode(uint8Array) ); // 你好
How does the encoding of this ended up to be an Asian character?
As I know the UTF-8 is 8 bit. So if I look at utf-8 charset map then I don't any Asian characters till 255.
On investigating the bits
- finding bits for the input
[228, 189, 160, 229, 165, 189].map(i => parseInt(i).toString(2))
// ["11100100", "10111101", "10100000", "11100101", "10100101", "10111101"]
- finding bits for the output
'你好'.split('').map((e,index) => '你好'.charCodeAt(index).toString(2) )
// ["100111101100000", "101100101111101"]
Things that are a mystery to me:
- total bits in the input are 48 while total bits in output are 30. Why?
- Also the bits pattern match at some places but not as whole. Like for 3rd and 6th element in input bit array matches the output bits array.
Is there something i am missing? Feel free to correct me