I am learning about UTF-16 encoding, and I have read that if you want to represent code points in the range of U+10000 to U+10FFFF, then you have to use surrogate pairs, which are in the range of U+D800 to U+DFFF.
So let's say I want to encode the following code point: U+10123 (10000000100100011 in binary):
First I layout this sequence of bits:
110110xxxxxxxxxx 110111xxxxxxxxxx
Then I fill the places with the x with the binary format of the code point:
1101100001000000 1101110100100011 (D840 DD23 in hexadecimal)
I have also read that the code points in the range of U+D800 to U+DFFF were removed from the Unicode character set, but I don't understand why this range was removed!
I mean this range can be easily encoded in 4 bytes, for example the following is the UTF-16 encoded format of the U+D812 code point (1101100000010010 in binary):
1101100000110110 1101110000010010 (D836 DC12 in hexadecimal)
Note: I was using UTF-16 Big Endian in my examples.