3

Will a C union of uint32_t and uint8_t[4] will always map the same way on little endian architectures?

e.g. with

union {
    uint32_t double_word;
    uint8_t octets[4];
} u;

will

u.double_word = 0x12345678;

always result in:

u.octets[0] == 0x78
u.octets[1] == 0x56
u.octets[2] == 0x34
u.octets[3] == 0x12

or is this undefined behaviour?

fadedbee
  • 42,671
  • 44
  • 178
  • 308
  • 2
    This is how endianness is defined. If you are on big endian, the byte array will be reversed – Morten Jensen Apr 03 '18 at 10:31
  • Yes, it's true *by definition*. What it means for a system to be "little endian" is that this relationship holds. The union behavior was historically undefined in C, but later defined as the result of a DR. Using it does have some caveats but you can always do the same safely with memcpy. – R.. GitHub STOP HELPING ICE Apr 03 '18 at 14:45

2 Answers2

5

TL;DR: Yes, the code is fine.

As noted, it contains implementation-defined behavior depending on endianess, but other than that, the behavior is well-defined and the code is portable (between little endian machines).


Detailed answer:

One thing that's important is that the order of allocation of an array is guaranteed, C11 6.2.5/20:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type.

This means that the array of 4 uint8_t is guaranteed to follow the allocation order of the uint32_t, which on a little endian system means least significant byte first.

In theory, the compiler is however free to toss in any padding at the end of a union (C11 6.7.2.1/17), but that shouldn't affect the data representation. If you want to pedantically protect against this - or more relevantly, you wish to protect against an issue in case more members are added later - you can add a compile-time assert:

typedef union {
    uint32_t double_word;
    uint8_t octets[4];
} u;

_Static_assert(sizeof(u) == sizeof(uint32_t), "union u: Padding detected");

As for the representation of the uintn_t types, it is guaranteed to be 2's complement (in case of signed types) with no padding bits (C11 7.20.1.1).

And finally, the issue about whether "type punning" through a union is allowed or undefined behavior, this is specified a bit vaguely in C11 6.5.2.3:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,95) and is an lvalue if the first expression is an lvalue.

Where the (non-normative) note 95 provides clarification:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

And since we already ruled out padding bits, trap representations is not an issue.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • @chux Oh, struct or union. You are right, this is a bit confusing to read. I'll edit! – Lundin Apr 03 '18 at 13:57
  • Fixed. It should make more sense now. – Lundin Apr 03 '18 at 14:02
  • I didn't realize the fixed size types guaranteed two's complement. That guarantee doesn't exist for other integer types. That could make for some very interesting conversions between `int` and `int32_t` for example. – dbush Apr 03 '18 at 14:13
  • @dbush the guarantee about `intN_t` being 2's complement also comes with the loophole that they are _optional_ types if the platform does not readily support them. When an `int` is not 2's complement, it is unlikely that `int32_t` exists , thus negated the [very interesting conversions](https://stackoverflow.com/questions/49627802/will-a-c-union-of-uint32-t-and-uint8-t4-will-always-map-the-same-way-on-little/49628541?noredirect=1#comment86271575_49628541). – chux - Reinstate Monica Apr 03 '18 at 17:57
3

On a platform that actually has both of these types, C11 §7.20.1.1 p2 gives you all the needed guarantees (given you know endianness):

The typedef name uintN_t designates an unsigned integer type with width N and no padding bits. Thus, uint24_t denotes such an unsigned integer type with a width of exactly 24 bits.

This is enough because there are no bytes with fewer than 8 bits, so having uint8_t available automatically means that a byte has exactly 8 bits.

  • Thanks. I wanted to make sure that "accessing the value of a union, by a field other than that through which it was set" was allowed by standards rather than just common practise. – fadedbee Apr 03 '18 at 10:39
  • 1
    @fadedbee It's allowed in C and undefined behavior in C++. See https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior. – interjay Apr 03 '18 at 10:46