It seems you're mixing up two things: the character set (Unicode) and their encoding (UTF-8 or UTF-16).
0x2124 is only the 'sequence number' in the Unicode table. Unicode is nothing more than a bunch of 'sequence numbers' mapped to certain characters. Such a sequence number is called a code point, and it's often written down as a hexadecimal number.
How that certain number is encoded, might take up more bytes than the raw code point would.
Short calculation of UTF-8 encoding of given character:
To know which bytes belong to the same character, UTF-8 uses a system where the first byte starts with a certain amount (lets call it N) of 1
bits followed by a 0
bit. N is the number of bytes the character takes up. The remaining bytes (N – 1) start with bits 10
.
Hex 0x2124 = binary 100001 00100100
According to abovementioned rules, this converts to the following UTF-8 encoding:
11100010 10000100 10100100 <-- Our UTF-8 encoded result
^ ^ ^ ^ ^ ^ ^
AaaaBbDd CcDddddd CcDddddd <-- Some notes, explained below
A
is a set of ones followed by a zero, which denote the number of bytes belonging to this character (three 1
s = three bytes).
B
is padding, because otherwise the total number of bits is not divisible by 8.
C
is the concatenation bits (each subsequent byte starting with 10
).
D
is the actual bits of our code point.
So indeed, the character ℤ takes up three bytes.