UB When Dereferencing Array of Unions

Question

Which of these are undefined behaviour:

template <class T> struct Struct { T t; };

template <class T> union Union { T t; };

template <class T> void function() {
  Struct aS[10];
  Union aU[10];

  // do something with aS[9].t and aU[9].t including initialization

  T *aSP = reinterpret_cast<T *>(aS);
  T *aUP = reinterpret_cast<T *>(aU);

  // so here is this undefined behaviour?
  T valueS = aSP[9];
  // use valueS in whatever way

  // so here is this undefined behaviour?
  T valueU = aUP[9];
  // use valueU in whatever way

  // now is accessing aS[9].t or aU[9].t now UB?
}

So yeah, which of the last 3 operations is UB?

(My reasoning: I don't know about the struct, if there is any requirement for its size to be the same as its single element, but AFAIK the union has to be the same size as the element. Alignment requirements I don't know for the union, but I am guessing it is the same. For the struct I have no idea. In the case of the union I would guess that it is not UB, but as I said, I am really really not sure. For the struct I actually have no idea)

Instead of having us do your homework for you, tell us what _you_ think and why and ask us to correct or confirm some specific reasoning. — Lightness Races in Orbit, Apr 23 '19 at 16:03
Also see https://stackoverflow.com/a/25377970/560648 and https://en.cppreference.com/w/cpp/types/is_standard_layout. And what is `T`? You did not construct any `T`s but depending on what `T` is that may not matter... but you have to provide all necessary information and context. — Lightness Races in Orbit, Apr 23 '19 at 16:05

Michael Kenzel · Accepted Answer · 2019-05-07T14:33:42.507

tl;dr: the last two statements in your code above will always invoke undefined behavior, simply casting a pointer to a union to a pointer to one of its member types is generally fine because it doesn't really do anything (it's unspecified at worst, but never undefined behavior; note: we're talking about just the cast itself, using the result of the cast to access an object is a whole different story).

Depending on what T ends up being, Struct<T> may potentially be a standard-layout struct [class.prop]/3 in which case

T *aSP = reinterpret_cast<T *>(aS);

would be well-defined because a Struct<T> would be pointer-interconvertible with its first member (which is of type T) [basic.compound]/4.3. Above reinterpret_cast is equivalent to [expr.reinterpret.cast]/7

T *aSP = static_cast<T *>(static_cast<void *>(aS));

which will invoke the array-to-pointer conversion [conv.array], resulting in a Struct<T>* pointing to the first element of aS. This pointer is then converted to void* (via [expr.static.cast]/4 and [conv.ptr]/2), which is then converted to T*, which would be legal via [expr.static.cast]/13:

A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

Similarly,

T *aUP = reinterpret_cast<T *>(aU);

would be well-defined in C++17 if Union<T> is a standard-layout union and looks to be well-defined in general with the coming version of C++ based on the current standard draft, where a union and one of its members are always pointer-interconvertible [basic.compound]/4.2

All of the above is irrelevant, however, because

T valueS = aSP[9];

and

T valueU = aUP[9];

will invoke undefined behavior no matter what. aSP[9] and aUP[9] are (by definition) the same as *(aSP + 9) and *(aUP + 9) respectively [expr.sub]/1. The pointer arithmetic in these expressions is subject to [expr.add]/4

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.

Otherwise, if P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i+j] if 0≤i+j≤n and the expression P - J points to the (possibly-hypothetical) element x[i−j] if 0≤i−j≤n.

Otherwise, the behavior is undefined.

aSP and aUP do not point to an element of an array. Even if aSP and aUP would be pointer-interconvertible with T, you'd only ever be allowed to access element 0 and compute the address of (but not access) element 1 of the hypothetical single-element array…

I don't understand [basic.compound]/4.2 like you. For me here we have a pointer to an union and a pointer to the same type as the element of the union. Not "one is a union object and the other is a non-static data member of that union". May be wrong though — Martin Morterol, May 07 '19 at 13:43
@MartinMorterol I'm not sure what you mean. A (standard-layout) union and one of its elements being pointer-interconvertible simply means (via [expr.static.cast]/13 as quoted above) that the result of casting a pointer to one to the type "pointer to the type of the other" will be a valid pointer value that points to the other object… — Michael Kenzel, May 07 '19 at 14:03
Ok, thx i got it. But I don't get why you say "aUP do not point to an element of an array" From [conv.array] "The result is a pointer to the first element of the array.". So, aUP should point to an element of an array ? — Martin Morterol, May 08 '19 at 11:43
@MartinMorterol the result of the array-to-pointer conversion on `aS` and `aU` points to an object which is the first element of an array. But the result of the `reinterpret_cast` will (in the best case) point to an object which is a subobject (a member) of the first object, and that subobject is not element of an array… — Michael Kenzel, May 08 '19 at 11:51
So is it true that `T valueS = aSP[0]` is undefined behavior? — Leonid, May 10 '19 at 02:25

Martin Morterol · Answer 2 · 2019-05-08T11:44:39.723

So if we look at the doc of reinterpret_cast (here)

5) Any object pointer type T1* can be converted to another object pointer type cv T2*. This is exactly equivalent to static_cast(static_cast(expression)) (which implies that if T2's alignment requirement is not stricter than T1's, the value of the pointer does not change and conversion of the resulting pointer back to its original type yields the original value). In any case, the resulting pointer may only be dereferenced safely if allowed by the type aliasing rules (see below)

Now What say the aliasing rules ?

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

AliasedType and DynamicType are similar.

AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.

AliasedType is std::byte, (since C++17)char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

So it's not 2 nor 3. May be 1?

Similar:

Informally, two types are similar if, ignoring top-level cv-qualification:

they are the same type; or

they are both pointers, and the pointed-to types are similar; or

they are both pointers to member of the same class, and the types of the pointed-to members are similar; or

they are both arrays of the same size or both arrays of unknown bound, and the array element types are similar.

And, from C++17 draft:

Two objects a and b are pointer-interconvertible if:

they are the same object, or

one is a union object and the other is a non-static data member of that object ([class.union]), or

one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, any base class subobject of that object ([class.mem]), or

there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast. [ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note]

So, for me :

T *aSP = reinterpret_cast<T *>(aS); // Is OK
T *aUP = reinterpret_cast<T *>(aU); // Is OK.

Leonid · Answer 3 · 2019-05-10T02:24:03.700

I found c++ - Is sizeof(T) == sizeof(int). This specifies that structs do not have to have the same size as their elements (sigh). As for unions, the same would probably apply (after reading the answers, I am led to believe so). This is alone necessary to make this situation UB. However, if sizeof(Struct) == sizeof(T), and "It's well-established that" in https://stackoverflow.com/a/21515546, a pointer to aSP[9] would be the same location as that of aS[9] (at least I think so), and reinterpret_cast'ing that is guarantied by the standard (according to the quote in https://stackoverflow.com/a/21509729).

EDIT: This is actually wrong. The correct answer is here.

UB When Dereferencing Array of Unions

3 Answers3