2

When I read The Swift Programming Language Strings and Characters. I don't know how U+203C (means !!) can represented by (226, 128, 188) in utf-8.

How did it happen ?

rmaddy
  • 314,917
  • 42
  • 532
  • 579
peizhang
  • 23
  • 2
  • Possible duplicate of [Manually converting unicode codepoints into UTF-8 and UTF-16](https://stackoverflow.com/questions/6240055/manually-converting-unicode-codepoints-into-utf-8-and-utf-16) –  Jul 15 '17 at 15:14

1 Answers1

0

I hope you already know how UTF-8 reserves certain bits to indicate that the Unicode character occupies several bytes. (This website can help).

First, write 0x203C in binary:

0x230C = 10000000111100

So this character takes 16 bits to represent. Due to the "header bits" in the UTF-8 encoding scheme, it would take 3 bytes to encode it:

0x230C =           10     000000     111100 

             1st byte   2nd byte   3rd byte
             --------   --------   --------
header       1110       10         10
actual data        10     000000     111100
-------------------------------------------
full byte    11100010   10000000   10111100
decimal           226        128        188
Code Different
  • 90,614
  • 16
  • 144
  • 163