Not quite understanding Endianness

Question

I understand that 0x12345678 in big endian is 0x12 0x34 0x56 0x78 and 0x78 0x56 0x34 0x12 in little endian.

But what is this needed for? I don't fully understand how it works: it seems deceptively simple.

Is it really as simple as byte order; no other difference?

Yes, that's all there is to it. It is needed since some processors and network equipment read dwords in little endian, others in big endian. — simonzack, Aug 02 '14 at 16:28

stakx - no longer contributing · Accepted Answer · 2014-08-02T18:32:13.730

Your understanding of endianness appears to be correct.

I would like to additionally point out the implicit, conventional nature of endianness and its role in interpreting a byte sequence as some intended value.

0x12345678 in big endian is 0x12 0x34 0x56 0x78 and 0x78 0x56 0x34 0x12 in little endian.

Interestingly, you did not explicitly state what these 0x… entities above are supposed to mean. Most programmers who are familiar with a C-style language are likely to interpret 0x12345678 as a numeric value presented in hexadecimal form, and both 0x12 0x34 0x56 0x78 and 0x78 0x56 0x34 0x12 as byte sequences (where each byte is presented in hexadecimal form, and the left-most byte is located at the lowest memory address). And that is probably exactly what you meant.

Perhaps without even thinking, you have relied on a well-known convention (i.e. the assumption that your target audience will apply the same common knowledge as you would) to convey the meaning of these 0x… entities.

Endianness is very similar to this: a rule that defines for a given computer architecture, data transmission protocol, file format, etc. how to convert between a value and its representation as a byte sequence. Endianness is usually implied: Just as you did not have to explicitly tell us what you meant by 0x12345678, usually it is not necessary to accompany each byte sequence such as 0x12 0x34 0x56 0x78 with explicit instructions how to convert it back to a multi-byte value, because that knowledge (the endianness) is built into, or defined in, a specific computer architecture, file format, data transmission protocol, etc.

As to when endianness is necessary: Basically for all data types whose values don't fit in a single byte. That's because computer memory is conceptually a linear array of slots, each of which has a capacity of 8 bits (an octet, or byte). Values of data types whose representation requires more than 8 bits must therefore be spread out over several slots; and that's where the importance of the byte order comes in.

P.S.: Studying the Unicode character encodings UTF-16 and UTF-8 helped me build a deeper understanding of endianness.

While both encodings are for the exact same kind of data, endianness only plays a role in UTF-16, but not in UTF-8. How can that be?
UTF-16 requires a byte order mark (BOM), while UTF-8 doesn't. Why?

Once you understand the reasons, chances are you'll have a very good understanding of endianness issues.

UTF-16 is a two `byte` while UTF-8 is one. So, UTF-8 does not need endianness as it's values never exceed what can be represented by a single byte (FF or 255). Similar for the second one- UTF-8 is a singly byte, so it doesn't worry about byte order. _Are these correct assumptions?_ — user3897320, Aug 02 '14 at 19:12
Sort of: A UTF-8 stream contains data in units of 8 bits, while UTF-16 contains data in units of 16 bits, and that's indeed why endianness matters for UTF-16. (But both are actually variable-length encodings. Unicode has >1M code points, so even UTF-16's 16 bits wouldn't suffice if it were simply "16 bits per code point".) What's interesting about UTF is how it makes provisions for "endianness interoperability". **1.** The BOM allows decoders to detect whether they use the correct endianness or whether they must additionally swap byte order. — stakx - no longer contributing, Aug 02 '14 at 21:22
(cont'd:) When a decoder encounters the BOM, it translates it either to the code point `U+FEFF` (which is the BOM and therefore means that the decoder uses the correct endianness), or to `U+FFFE` (which is defined as an invalid character and therefore signals incorrect endianness during decoding). **2.** In UTF-8, even though one code point can take up 1 to 4 bytes in the byte stream, endianness does not matter! That's because the UTF-8 encoding *explicitly* defines the byte ordering. — stakx - no longer contributing, Aug 02 '14 at 21:26
**P.S.** regarding (1) above: Instead of "the correct endianness", I should really have written, "the same endianness as the UTF-16 data stream". That is, if the BOM is present in a UTF-16 data stream, it amounts to an explicit declaration of which endianness the stream is using. — stakx - no longer contributing, Aug 02 '14 at 21:40

score 1 · Answer 2 · answered Aug 02 '14 at 16:32

It appears that your understanding of endianness is just fine.

Since there is more than one possible byte ordering for representing multi-byte data types' values in a linear address space, different CPU / computer manufacturers apparently chose different byte orderings in the past. Thus we have Big and Little Endian today (and perhaps other byte orderings that haven't got their own name).

Wikipedia has a good article on the matter, btw.

Not quite understanding Endianness

2 Answers2