Detecting endianness type at runtime - how many bytes are needed for conclusive results?

Question

I want to be able to detect the endianness of my system at runtime, programatically.

In this question, there is an example of a function using 4 bytes to determine 4 main types of endiannedness: BIG, SHORT, BIG WORD, SHORT WORD.

int endianness(void)
{
  uint8_t buffer[4];

  buffer[0] = 0x00;
  buffer[1] = 0x01;
  buffer[2] = 0x02;
  buffer[3] = 0x03;

  switch (*((uint32_t *)buffer)) {
  case 0x00010203: return ENDIAN_BIG;
  case 0x03020100: return ENDIAN_LITTLE;
  case 0x02030001: return ENDIAN_BIG_WORD;
  case 0x01000302: return ENDIAN_LITTLE_WORD;
  default:         return ENDIAN_UNKNOWN;
}

My question is: are 4 bytes enough to conclude the endianess, or should one maybe use more to be extra careful for future inventions (like maybe BIG and SMALL sets of 3 or 4).

My concern is that some unholy version of endianess would maybe result in the same order of bytes as the ones presented, but under the hood, it would actually be something different.

That being said, I feel like maybe it wouldn't matter as long as the results are precise. For instance, if the longest variable in my program is 4 bytes, and it reliably produces the same signature as the function above, then it shouldn't be a problem.

I am specifically asking for the type of testing as the example above.

There is no type large enough to handle every possible "unholy version of endianess". — Scott Hunter, Apr 13 '23 at 13:38
Systems that neither use big nor little endian only have themselves to blame. If your system misbehaves on such exotic systems, it is a _feature_, not a problem. Or well, not _your_ problem, but a problem of the person who decided to use one. Keep giving them more problems until they change system and you'll be doing the world a favour. — Lundin, Apr 13 '23 at 13:45
@basedchad21, "detect the endianness of my system at runtime," --> endian-ness of an `int` and `float` may differ - real systems have done this. Endian-ness is not a system property, but a type one. — chux - Reinstate Monica, Apr 13 '23 at 14:06
@basedchad21, "I want to be able to detect the endianness" --> Why? I suspect that your desire to know masks a higher level problem that is the real issue. What problem does knowing the endianness solve? — chux - Reinstate Monica, Apr 13 '23 at 14:13

score 2 · Accepted Answer · answered Apr 13 '23 at 13:45

What you're testing for should be sufficient, however the way in which you're testing can trigger undefined behavior because you're accessing an lvalue of one type through an lvalue of a different type. You can sidestep this if you use a union:

union endian {
  uint8_t bytes[4];
  uint32_t word;
};

int endianness(void)
{
  union endian test = { .bytes = { 0, 1, 2, 3 } };

  switch (test.word) {
  case 0x00010203: return ENDIAN_BIG;
  case 0x03020100: return ENDIAN_LITTLE;
  case 0x02030001: return ENDIAN_BIG_WORD;
  case 0x01000302: return ENDIAN_LITTLE_WORD;
  default:         return ENDIAN_UNKNOWN;
}

Or just go the other way around - access a single byte out of a 32 bit type, which is well-defined: `#define IS_LITTLE_ENDIAN (*(unsigned char*)&(uint32_t){1} == 1)` — Lundin, Apr 13 '23 at 13:53

score 2 · Answer 2 · answered Apr 13 '23 at 14:03

What the C standard says about the order of bytes in memory is in C 2018 6.2.6 2:

Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

This does not say there is any relationship between the order of bytes in a short and the order of bytes in an int, or in a long, a long long, a double, or other types. It does not say the order is constrained to only certain permissible orders, such as that one of the four orders you list must be used. There are 4! = 24 ways to order four bytes, and it would be permissible, according to the C standard, for a C implementation to use any one of those 24 for a four-byte int, and for the same C implementation to use any one of those 24, the same or different, for a four-byte long.

To fully test what orders a C implementation is using, you would need to test each byte in each type of object bigger than one byte.

In most C implementations, it suffices to assume bytes are in big-endian order (most significant byte first, then bytes in order of decreasing significance) or little-endian order (the reverse). In some C implementations, there may be a hybrid order due to the history of the particular C implementation—for example, its two-byte objects might have used one byte order due to hardware it originally ran on while its four-byte objects were constructed in software from two-byte objects that were ordered based on the programmer’s choice.

A similar situation can arise with larger objects, such as a 64-bit double stored as two 32-bit parts.

However, variants with other orders, such as the bytes 0, 1, 2, and 3 (denoted by significance) stored in the order 3, 0, 1, 2, would arise only in perverse C implementations that technically conform to the C standard but do not serve any practical purpose. Such possibilities can be ignored in ordinary code.

To explore all possibilities, you must also consider the order in which bits are stored within the bytes of an object. The C standard requires that “the same bits” be used for the same meaning only between corresponding signed and unsigned types, in C 2018 6.2.6.2 2:

… Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type…

Thus, a C implementation in which bits 3 and 4 of the first byte of an int represented 2³ and 2⁴ but represented 2⁴ and 2³ in a long would technically conform to the C standard. While this seems odd, note that the fact the standard specifically constraints this for corresponding signed and unsigned types but not for other types suggests there were C implementations that assigned different meanings to corresponding bits in different types.

Detecting endianness type at runtime - how many bytes are needed for conclusive results?

2 Answers2

Linked