Why could be natural thinking to Unicode encoding as an array of 32-bit integers?

Question

I was reading Python guide about Unicode. In this section, it says:

To summarize the previous section: a Unicode string is a sequence of code points, which are numbers from 0 to 0x10ffff. This sequence needs to be represented as a set of bytes (meaning, values from 0-255) in memory. The rules for translating a Unicode string into a sequence of bytes are called an encoding.

The first encoding you might think of is an array of 32-bit integers. In this representation, the string “Python” would look like this:

   P           y           t           h           o           n
0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00
   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Why might we think of 32-bit integers if code points are numbers from 0 to 0x10ffff? Maybe is it assuming that we are on a 32-bit system?

Thanks! My problem was this: since we have 1,114,111 (0x10ffff) symbols, we would need only 21 bits to code them all. But probably, that 32-bits format is due to calculators architectures. That's right? — floatingpurr, Dec 21 '15 at 12:02
I'd assume this and the fact that programmers tend to powers of two (for the same reason). So “first thing a programmer thinks of” would be a power of two. But there might be deeper reasons, the only way to be sure is to ask the authors, I guess. — Hermann Döppes, Dec 21 '15 at 12:08
Possible duplicate of [Why UTF-32 exists whereas only 21 bits are necessary to encode every character?](http://stackoverflow.com/questions/6339756/why-utf-32-exists-whereas-only-21-bits-are-necessary-to-encode-every-character) — dan04, Dec 22 '15 at 23:59

Why could be natural thinking to Unicode encoding as an array of 32-bit integers?

0 Answers0