3

According to this answer the following code invokes undefined behavior:

uint16_t *buf = malloc(16); // 8*sizeof(uint16_t)
buf[1] = *buf = some_value;
((uint32_t *)buf)[1] = *(uint32_t *)buf;
((uint64_t *)buf)[1] = *(uint64_t *)buf;

We may write any type to malloc() memory but we may not read a previously written value as an incompatible type by casting pointers (with the exeption of char).

Could I use this union:

union Data {
    uint16_t u16[8];
    uint32_t u32[4];
    uint64_t u64[2];
};

As such:

union Data *buf = malloc(16);
buf->u16[1] = buf->u16[0] = some_value;
buf->u32[1] = buf->u32[0];
buf->u64[1] = buf->u64[0];

In order to avoid undefined behavior via strict aliasing violations? Also, could I cast buf to any of uint16_t *, uint32_t *, uint64_t *, and then dereference it without invoking undefined behavior, since these types are all valid members of union Data? (i.e. is the following valid):

uint16_t first16bits = *(uint16_t *)buf;
uint32_t first32bits = *(uint32_t *)buf;
uint64_t first64bits = *(uint64_t *)buf;

If not (i.e. the above code making use of union Data is still invalid), when can and cannot unions be used (in pointer casts or otherwise) to produce valid code that does not violate strict aliasing rules?

user16217248
  • 3,119
  • 19
  • 19
  • 37

2 Answers2

3

Yes, it is acceptable to write one union member and read another. Section 6.5p7 of the C standard states:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

  • a type compatible with the effective type of the object,
  • a qualified version of a type compatible with the effective type of the object,
  • a type that is the signed or unsigned type corresponding to the effective type of the object,
  • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
  • a character type

It is also safe to convert the address of a union to that of any of its members. From section 6.7.2.1p16:

The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit- field, then to the unit in which it resides), and vice versa

dbush
  • 205,898
  • 23
  • 218
  • 273
  • I prefer to use http://port70.net/~nsz/c/c11/n1570.html in my citations. It's not the official standard, but it's in HTML format so you can use deep links to sections. – Barmar Sep 09 '22 at 20:53
  • 4
    It is perhaps notable that the dynamic allocation has nothing in particular to do with it. You can write to one member of a union and read from a different one regardless of how the union was allocated. – John Bollinger Sep 09 '22 at 20:56
  • 2
    One does have to be a bit careful, however, if the members do not all have the same size, because when you write to one member, the bytes of the union that are not used to to store its value take unspecified values. Those can be and often are the same values they had before, but they don't have to be. That doesn't apply to the union in question here, however, as the relevant members are the arrays, not their elements. – John Bollinger Sep 09 '22 at 20:59
  • @JohnBollinger: The way the Standard describes Effective Types makes that unclear, since it has special rules for storage with no "declared type", and calls out dynamically-allocated storage as fitting that criterion, but fails to make clear whether it is unique in that regard. Some people seem to think C has a concept of an "active union member", but the Standard uses no such terminology and says nothing about how it would interact with code that forms the address of `someUnion.arrayMember+index` and/or uses the resulting pointer to access such storage. – supercat Sep 12 '22 at 21:10
  • 1
    @JohnBollinger: It's also necessary to be careful if union members don't all have the same alignment. If a union contains e.g. 32-bit and 16-bit types, casting e.g. a `uint16_t*` into a pointer-to-union type will invoke UB (*and may cause clang to yield code that doesn't work*) if the pointer isn't 32-bit aligned, even if only 16-bit and/or 8-bit members of the union are used for access. – supercat Sep 12 '22 at 21:41
1

The construct someUnion.someArray[i] is defined as meaning *(someUnion.someArray+i), with the latter being an access to an lvalue of the array element type that has no relation whatsoever to the union type.

C implementations will generally recognize a construct which is written using the array-bracket notation as having an association with the union type, even in cases where they would not do so if the construct were written using explicit pointer arithmetic syntax. Such special treatment for array-bracket notation, however, is purely up to the discretion of individual implementations.

On the flip side, a pointer to an object may only be converted to a pointer to a union type containing that object if the pointer satisfies the alignment requirements of all members within the union, without regard for whether the members in question are accessed. On platforms that do not support unaligned access, clang will process a construct like:

union quadbyte {
    unsigned char bb[4];
    unsigned short hh[2];
    unsigned int ww[1];
};
#include <string.h>
unsigned  test(union quadbyte *src)
{
    return src->hh[0] | (src->hh[1] << 16);
}

in a manner that will fail if src isn't properly aligned for type union quadbyte, even if it would be properly aligned for type unsigned short.

supercat
  • 77,689
  • 9
  • 166
  • 211