An 8-bit char
can only hold 256 values max. But Unicode has hundreds of thousands of characters. They obviously can't fit into a single char
. So, they have to be encoded in such a way that they can fit into multiple char
s.
Your editor/compiler is likely storing your example string in UTF-8 encoding. Non-ASCII characters in UTF-8 take up more than 1 char
.
In your example, in UTF-8, sizeof(chars)
would be 55+1=56
char
s in size (+1 for the null terminator), even though you see only 29 "characters" (if you count the spaces), where:
= 0x20
(18x)
= 0xF0 0x9F 0x98 0x8E
= 0xF0 0x9F 0xA5 0xB8
= 0xF0 0x9F 0xA4 0xA9
= 0xF0 0x9F 0xA5 0xB3
必
= 0xE5 0xBF 0x85
西
= 0xE8 0xA5 0xBF
♠
= 0xE2 0x99 0xA0
♬
= 0xE2 0x99 0xAC
♭
= 0xE2 0x99 0xAD
♮
= 0xE2 0x99 0xAE
♯
= 0xE2 0x99 0xAF