Do word size and endianness interplay when writing cross platform bit level code?

Question

I was just looking at this answer which gives the following sample code to convert an int to an array of bytes:

int intValue;
byte[] intBytes = BitConverter.GetBytes(intValue);
if (BitConverter.IsLittleEndian)
    Array.Reverse(intBytes);
byte[] result = intBytes;

I looked up Endianness and found that the reversal of bytes (or lack thereof) is at the level of a word, which does not have a fixed length.

Does the above code depend on int being the size of 1 word? If so, how would you write platform agnostic code?

As a side note, I am fairly sure I remember, back in the day, looking at the memory view of a debugger and having to reverse 2 sets of 2 bytes in order to construct a 4 byte value... this is what got me thinking on this matter..

In C#, [`int` is always `Int32`](https://msdn.microsoft.com/en-us/library/ya5y69ds.aspx), so there's no problem. — Blorgbeard, May 01 '15 at 04:32
@Blorgbeard, but could an Int32 exist on an architecture with word size 2 bytes? — Aaron Anodide, May 01 '15 at 05:27
@AaronAnodide Unless the CPU is [middle-endian](http://en.wikipedia.org/wiki/Endianness#Middle-endian), it wouldn't change what the word size is — xanatos, May 01 '15 at 05:34
@xanatos, so are you saying on a 32 bit CPU (i.e. 4 byte word size), a 64 bit integer will exist as contiguous bytes? In other words, endianness effects not only the byte order in a word, but also the word order in the case that a value spans multiple words... i think I may be getting to a better understanding if this is the case... — Aaron Anodide, May 01 '15 at 05:48
_"but could an Int32 exist on an architecture with word size 2 bytes"_ -- yes, just as `Int64` can and does exist, even when you are running your .NET program on an x86 (32-bit) platform. The platform architecture has nothing at all to do with the data formats available; at most, it affects what data formats the platform can handle most efficiently. — Peter Duniho, May 01 '15 at 07:14

score 3 · Accepted Answer · answered May 01 '15 at 07:08

To correctly deal with endianness, you need to know two things: whether the data is big- or little-endian, and what size the given unit of data is.

This doesn't mean that you can't handle data of varying lengths. It does mean that you need to know what size data you're dealing with.

But this is true anyway. If you receive a series of bytes over the network (for example), you need to know how to interpret them. If you get 32 bytes, that could be text, it could be eight 32-bit integers, it could be four 64-bit integers, or whatever.

If you are expecting a 32-bit integer, then you need to handle endianness 32 bits (4 bytes) at a time. If you are expecting a 64-bit integer, then 8 bytes at a time. This has nothing to do with what a "word" is defined to be for your CPU architecture, or your language, or your managed run-time. It has everything to do with the protocol your code is dealing with.

Even within a given protocol, different pieces of data may be different sizes. You might have a mix of short, int, and long, and you need to accommodate that.

It's exactly the same reason that e.g. BitConverter or BinaryReader consumes or generates a different number of bytes, depending on what type of data you are retrieving. It's just that instead of consuming or generating the different number of bytes, you reverse (or not, if the platform endianness matches the protocol) the different number of bytes.

In your example, if you passed to BitConverter.GetBytes() a long, then the overload the compiler selects would be the method that takes a long instead of an int and it would return eight bytes instead of four. And reversing the entire eight bytes would be the right thing to do.

Do word size and endianness interplay when writing cross platform bit level code?

1 Answers1