Consider the Chinese letter "語" - its UTF-8 encoding is:
11101000 10101010 10011110
While its UTF-16 encoding is shorter:
10001010 10011110
I'd love to understand why UTF-8 is bigger. I researched, but still let me break down what I want to understand.
- how is highest bit used to say how many byte character needs ?
- what's the code point of
語
? - I'd appreciate if you could encode the the above character in both UTF-8 and UTF-16 in simple terms so I understand why UTF-16 is smaller.