2

What are the rules that govern the uninitialized bytes of a union ? (Assuming some are initialized)

Below is a 32 bytes union of which I initialize only the first 16 bytes via the first member. It seems the remaining bytes are zero-initialized. That's great for my use case but I am wondering what's the rule behind this - I was expecting garbage.

#include <cstdint>
#include <iostream>

using namespace std;

union Blah {
   struct {
      int64_t a;
      int64_t b;
   };
   int64_t c[4];
}

int main()
{
   Blah b = {{ 1, 2 }}; // initialize first member, so only the first 16 bytes.

   // prints 1, 2, 0, 0 -- not 1, 2, <garbage>, <garbage>
   cout << b.c[0] << ", " << b.c[1] << ", " << b.c[2] << ", " << b.c[3] << '\n';

   return 0;
}

I've compiled on GCC 4.7.2 with -O3 -Wall -Wextra -pedantic (that last one required giving a name to the anonymous struct). That hopefully should save me from being lucky.

I've also tried to overlay two variables with two different scopes on the stack but gcc didn't give them the same address.

I've also tried replacing the array by another struct in that case that would have mattered, but it didn't change anything.

I can't access online compilers from here, they're blocked by my work.

timrau
  • 22,578
  • 4
  • 51
  • 64
J.N.
  • 8,203
  • 3
  • 29
  • 39

3 Answers3

6

The most pertinent part of the C11 standard 6.2.6.1.7, while not speaking specifically to initialization:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values.

Section 6.7.9.17 says:

Each brace-enclosed initializer list has an associated current object. When no designations are present, subobjects of the current object are initialized in order according to the type of the current object: array elements in increasing subscript order, structure members in declaration order, and the first named member of a union.

but doesn't explicitly come out and say the other bits are not initialized. For static unions, 6.7.9.10 says:

the first named member is initialized (recursively) according to these rules, and any padding is initialized to zero bits;

so the first named member and any padding bits would be zero-initialized, but the bits corresponding to other (by implication, larger) members of the union would be unspecified.

So you cannot count on those extra bytes being initialized to zero.

Note that technically, even if you do initialize your c array to zero, the moment you store something in your struct those excess bits become unspecified again, and you can't count on them still being zero. There's a lot of code out there which assumes this is true (e.g. putting a char array in a union to access the individual bytes), and in reality it probably will be, but the standard doesn't guarantee it.

Crowman
  • 25,242
  • 5
  • 48
  • 56
  • Your answer actually says that we *can't* rely on C++ when we work with unions, containing members with *different* size, right? So, what can we do to guarantee that the assignment of smaller (by size) value to the union does not produce garbage in extra bits? (And the same question for union constructor with smaller type) – HEKTO Nov 16 '15 at 19:32
  • 1
    @HEKTO: First, note this answer is from a C rather than a C++ perspective, although I know the question specifies C++. And you can't do anything to guarantee it, at least from a standard point of view. Again, it's pretty unlikely on a normal computer that those extra bits would, in fact, randomly change. You're only supposed to use one member of a union at a time, so standards-wise, you shouldn't care about those extra bits anyway. If you do, you have to look outside of the standard for your reassurance. – Crowman Nov 16 '15 at 23:34
1

Brace-enclosed initializers for a union are only permitted to initialize the first member. This is fine, and your initializer does initialize the anonymous struct, and causes the first member to be the active member.

In C++ only one member of a union may be active at any time. Trying to read the other members via the union causes undefined behaviour. Trying to read them by aliasing them as a character type gives unspecified values.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • not 100% sure that UB is caused by reading the inactive member (as opposed to reading garbage which has a chance of UBing depending on Stuff ... this stuff changes all the time!) – M.M Nov 07 '14 at 03:29
  • So there's no aliasing possible through unions in C++ (even if I'd pay attention and have both members of the same size) ? – J.N. Nov 07 '14 at 03:29
  • @JN yes that's correct. C permits aliasing through unions and C++ doesn't. – M.M Nov 07 '14 at 03:29
  • Pedantically speaking your code isn't aliasing, since it's `int` in both cases ; the rule is that you can only reliably access the same member that was last set. – M.M Nov 07 '14 at 03:30
0

So I would say that the observed behavior is backed by the standard.

ISO/IEC 9899:201x in 6.7.9 (Initialization) statement 12 says:

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

Static objects are initialized to 0 (see 6.7.9.10 or The initialization of static variables in C).

Varyag
  • 11
  • 1