0

I have the following union:

union PM_word {
    uint8_t  u8[8];
    uint16_t u16[4];
    uint32_t u32[2];
    uint64_t u64[1];
};

Suppose I initialize an instance of this union as follows:

PM_word word;
word.u32[0] = 0;
word.u16[2] = 1;
word.u8[6] = 2;
word.u8[7] = 3;

If I understand correctly, it is undefined behavior to attempt to read word.u32[1], because the elements of word.u16 and word.u8 that have been set overlap with it. But, is it also undefined behavior to read word.u32[0]?

EDIT: Retagged C++ as well. If C and C++'s semantics differ on this matter, answers about both C and C++ are greatly appreciated.

isekaijin
  • 19,076
  • 18
  • 85
  • 153
  • if you haven't explicitly set the value, it's value will be undefined. – dangee1705 Jul 30 '19 at 20:26
  • @dangee1705: I have set the value of `word.u32[0]`. The question is whether setting the values of `word.u16[2]`, `word.u8[6]` and `word.u8[7]` afterwards prevents me from using the value of `word.u32[0]`. – isekaijin Jul 30 '19 at 20:27
  • 3
    since the memory overlaps, changing u16 or u8 means that reading u32 is undefined. In practice, you can still read u32 and it will probably make some sense depending on what you set the u16 and u8 values. – dangee1705 Jul 30 '19 at 20:30
  • 2
    its really platform dependant. different computers will arrange the memory differently and so the actual value you will read back from u32 will differ depending on the specific computer – dangee1705 Jul 30 '19 at 20:31
  • @NathanOliver: Please correct me if I am wrong. According to the accepted answer to your linked question, in C++, I would have to define a union of every possible packed combination of `uint32_t`s, `uint16_s`s and `uint8_t`s, right? – isekaijin Jul 30 '19 at 20:40
  • 2
    As far as I understand it in C++, you just can't do this. You would have to `memcpy` the one array into whatever array you wanted to access it for it to be defined behavior in C++. C++ doesn't do type punning. – NathanOliver Jul 30 '19 at 20:42
  • @NathanOliver: Thanks! – isekaijin Jul 30 '19 at 20:43
  • 3
    C and C++ differ on this. You should not tag both in one question. – Eric Postpischil Jul 30 '19 at 21:01
  • In the answer box, my answer would be downvoted. But here I can give the correct precise answer to this question: NO – A M Jul 30 '19 at 21:32

2 Answers2

2

Relevant C++ standard quotes:

[class.union]

In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended ([basic.life]). At most one of the non-static data members of an object of union type can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [... A special case regarding classes with common initial sequence, which does not apply to this example because the union has no members of class type ...]

...

When the left operand of an assignment operator involves a member access expression ([expr.ref]) that nominates a union member, it may begin the lifetime of that union member, as described below. For an expression E, define the set S(E) of subexpressions of E as follows:

  • If E is of the form A[B] and is interpreted as a built-in array subscripting operator, S(E) is S(A) if A is of array type, S(B) if B is of array type, and empty otherwise.

  • [... other cases are not relevant, because the quoted one applies ...]

In an assignment expression of the form E1 = E2 that uses either the built-in assignment operator ([expr.ass]) or a trivial assignment operator ([class.copy.assign]), for each element X of S(E1), if modification of X would have undefined behavior under [basic.life], an object of the type of X is implicitly created in the nominated storage; no initialization is performed and the beginning of its lifetime is sequenced after the value computation of the left and right operands and before the assignment. [ Note: This ends the lifetime of the previously-active member of the union, if any ([basic.life]). — end note  ]

What this means that after the statement word.u8[6] = 2;, only the word.u8 member is active, and all other union members are inactive. After the last statement word.u8[7] = 3;, only word.u8[6] and word.u8[7] have initialised values.

The lifetime of the inactive members has ended. Here is a relevant rule regarding whether it is OK to read their value:

[basic.life]

Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways. ... The program has undefined behavior if:

  • the glvalue is used to access the object, or
  • ...

So, yes. Accessing the inactive members would be UB (except when that access is an assignment, or in the special case involving common initial sequence of classes, as per [class.union]).

eerorika
  • 232,697
  • 12
  • 197
  • 326
-1

I would say that the behaviour is not undefined, just not what you may expect... Since - as you pointed out - the memory oeverlaps, you cannot use (set) all at once. But for instance using word.u8[0] and word.u8[1] and word.u16[1] at the same time would work. As soon as you use word.u64[0] all other variables are changed. If they are corrupted depends on the value...

Mario The Spoon
  • 4,799
  • 1
  • 24
  • 36