12

How should I zero out an anonymous union? I couldn't find anything on cppreference page about it. Would memseting it's largest member with 0 work here?

For example -

#include <iostream>
#include <cstring>

struct s{
    char a;
    char b[100];
};

int main(){
 union {
   int a;
   s b;
   char c;
 };

  // b.a = 'a'; (1)

  std::memset(&b, 0, sizeof(b));

  std::cout << a << "\n";
  std::cout << b.a << " " << b.b << "\n";
  std::cout << c << "\n";
}

Also if this would work, should I uncomment (1) before using memset() to activate the largest member?

Abhinav Gauniyal
  • 7,034
  • 7
  • 50
  • 93
  • Why are you creating an anonymous union in the first place? – zett42 Feb 21 '17 at 08:14
  • 4
    There's no instance of the union here, only the declaration of its type. – Quentin Feb 21 '17 at 08:16
  • 3
    @Quentin Why does this compile then? – xinaiz Feb 21 '17 at 08:19
  • @zett42, using anonymous union is basically for convenience wrt to named unions. The use of anonymous unions in function scope could save some stack space in theory, but that's not justifiable today since compilers can reuse the storage of objects with automatic storage duration if it determines lifetimes or usage(in the case of integral types) don't overlap. – WhiZTiM Feb 21 '17 at 08:25
  • 4
    Why not simply add a name to that union so you don't have to care which member is the biggest? `union { int a; s b; char c; } u = { 0 };` problem solved... – zett42 Feb 21 '17 at 08:25
  • 5
    @BlackMoses because [I'm mistaken](http://en.cppreference.com/w/cpp/language/union#Anonymous_unions)! That's a TIL for me. I knew that worked for class members, didn't know about function-local variables. – Quentin Feb 21 '17 at 08:43
  • @WhiZTiM it does helps with wrapping some C code to prevent strict aliasing too. – Abhinav Gauniyal Feb 21 '17 at 09:05
  • 1
    @zett42 because I want to and it is perfectly legal. – Abhinav Gauniyal Feb 21 '17 at 09:05
  • 1
    Union aliasing is not permitted in Standard C++ ; even if you get the initialization figured out, your later code causes undefined behaviour – M.M Feb 21 '17 at 13:21
  • @M.M where is aliasing in the example I gave above? Oh okay, I'll make sure to init other members before accessing them too. but I still need to zero them out first. The above example was just to check if they are actually zero or not. But you certainly provided me a better idea to zero specific member before it's use! – Abhinav Gauniyal Feb 21 '17 at 13:27
  • 1
    You can only read the member that was last written. (Common compilers support C-like union aliasing as a non-standard extension though). – M.M Feb 21 '17 at 13:32
  • @M.M what is meant by written, does `memcpy` or `memset` qualify or just `=` operator does? – Abhinav Gauniyal Feb 21 '17 at 13:35
  • @AbhinavGauniyal Not sure offhand, union initialization is complicated :) – M.M Feb 21 '17 at 13:37
  • Can you be more precise about what it is you're trying to accomplish? Is your goal to ensure that you can read any member and it will have a zero value? If not, what is your goal? – David Schwartz Feb 21 '17 at 19:20
  • @DavidSchwartz to read a member's value that might not be fully initialized yet, and not get garbage but 0s. Eg imagine in above example there was another member struct `s2` with `int a` & `char b[20]`. Now when I do `s2.a = 5;`, `s2.b` is still uninitialized and might have garbage values on access. So I want to zero out whole union so that I can expect uniform 0 value on failure. – Abhinav Gauniyal Feb 21 '17 at 19:27
  • Note the [interesting case for bool](http://stackoverflow.com/q/33380742/1708801) which even if you can make the rest defined is still a corner case. – Shafik Yaghmour Feb 27 '17 at 04:30
  • 1
    @M.M "You can only read the member that was last written." Not true. The compiler will not prevent you from doing so. Reading any member but the last written one is undefined behavior. This is an important distinction. It's also important to distinguish 'undefined behavior' per the language spec vs. what the compiler/OS/processor are going to do. – Rob K Mar 01 '17 at 18:22
  • @RobK By "You cannot do X", I mean "Doing X is not well-defined behaviour" – M.M Mar 01 '17 at 19:45

3 Answers3

6

If you really want to respect the standard, you should know that the code you have written is undefined behaviour:C++ standard §3.8 [basic.life]:

... except that if the object is a union member or subobject thereof, its lifetime only begins if that union member is the initialized member in the union (8.6.1, 12.6.2), or as described in 9.3. The lifetime of an object o of type T ends when: (1.3) — if T is a class type with a non-trivial destructor (12.4), the destructor call starts, or (1.4) — the storage which the object occupies is released, or is reused by an object that is not nested within o (1.8).

In §9.3 it is explained that you can activate a member of a standard-layout union by assigning to it. It also explains that you can explore the value of a member of a union which is not activated only when certain criteria are respected:

If a standard-layout union contains several standard-layout structs that share a common initial sequence (9.2), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see 9.2. — end note ]

So when you write std::cout<< a << "\n" you have not initialized a, or activated it by an assignment, and no member have been initialized so you are in Undefined Behavior (Nota: but the compilers I know support it, at least on PC, as an extension to the standard.)

So before using a you will have to write a=0, or make a the initialized member of the union, because a does not share a common initialization sequence with neither b nor c.

So if you use memset as also proposed in the answer of MSalters whatever you do, you will have to assign something to a member of the union before using it. If want to stay in defined behavior, do not use memset. Notice that memset can safely be used with standard-layout object which are not member of union since their life-time begin when storage is obtained for them.


In conclusion to stay in defined behaviour you must at least initialize one member, then you can inspect other members of the union who share a common initialization sequence with the initialized member.

  1. If your intent is to use an anonymous union in the main function, you can declare the union static: all static objects are zero initialized. (But are not reinitialized when you recall the function which will not happen with main()):

    int main(){
     static union {
      s b;
      int a;
      char c;
      };
     //...
     }
    

    As described in C++ standard §8.6 article (6.3) [dcl.init]:

    if T is a (possibly cv-qualified) union type, the object’s first non-static named data member is zero- initialized and padding is initialized to zero bits;

  2. Otherwise if there are no padding between member of the structures (s), you can aggregate initialize with an empty list the larger member (s):

    //...
    int main(){
      union {
       int a;
       s b{};
       char c;
       };
      //...
      }
    

    This work because all members of unions are aligned. So if there are no padding between members of s, every byte of memory of the union will be zero initialized, C++ standard §9.3 [class.union] article 2:

    The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. [ Note: A union object and its non-static data members are pointer-interconvertible (3.9.2, 5.2.9). As a consequence, all non-static data members of a union object have the same address.

  3. If there is padding inside S, then just declare an array of char for initialization purpose:

    //...
    int main(){
      union {
       char _initialization[sizeof(s)]{};
       int a;
       s b;
       char c;
       };
      //...
      }
    

Nota: Using your example, or the two last code exemples, and the code using memset produces the exact same set of instructions for initialization (clang -> x86_64):

    pushq   %r14
    pushq   %rbx
    subq    $120, %rsp
    xorps   %xmm0, %xmm0
    movaps  %xmm0, 96(%rsp)
    movaps  %xmm0, 80(%rsp)
    movaps  %xmm0, 64(%rsp)
    movaps  %xmm0, 48(%rsp)
    movaps  %xmm0, 32(%rsp)
    movaps  %xmm0, 16(%rsp)
    movq    $0, 109(%rsp)
Oliv
  • 17,610
  • 1
  • 29
  • 72
  • is it producing same code for multiple `memset`s to each union member? – Abhinav Gauniyal Feb 21 '17 at 13:08
  • @AbhinavGauniyal No, in a union, their is only one member initialized. This is the fundamental idea behind union, the answer of MSlaters is a nonsense. I honestly can not understand why he has been upvoted. – Oliv Feb 21 '17 at 16:47
  • 3
    @AbhinavGauniyal As a general rule, when in a C++ code you see memset or memcopy etc... even for trivial types, remember their is a C++ style code which is at least as efficient as these c-family functions. Remember that the poeple who have developped the C++ standard have already thought about replacing the use of these old c-family functions. For exemple, many papers have proven that std::copy is more efficient than memcpy or that std::sort is much more efficient than qsort... – Oliv Feb 21 '17 at 16:55
3

Just memset every member, and count on the optimizer to eliminate redundant writes.

MSalters
  • 173,980
  • 10
  • 155
  • 350
1

I just share an idear, maybe we can use metaprograming like this:

template<typename T1, typename T2>
struct Bigger
{
  typedef typename std::conditional<sizeof(T1) >= sizeof(T2), T1, T2>::type Type;
};

// Recursion helper
template<typename...>
struct BiggestHelper;

// 2 or more types
template<typename T1, typename T2, typename... TArgs>
struct BiggestHelper<T1, T2, TArgs...>
{
    typedef typename Bigger<T1, typename BiggestHelper<T2, TArgs...>::Type>::Type Type;
};

// Exactly 2 types
template<typename T1, typename T2>
struct BiggestHelper<T1, T2>
{
    typedef typename Bigger<T1, T2>::Type Type;
};

// Exactly one type
template<typename T>
struct BiggestHelper<T>
{
    typedef T Type;
};

template<typename... TArgs>
struct Biggest
{
    typedef typename BiggestHelper<TArgs...>::Type Type;
};

So in the main fucntion we can do like this:

std::memset(&b, 0, sizeof(Biggest<int,s,char>::Type));
Ron Tang
  • 1,532
  • 12
  • 20