Initializing a union with a non-trivial constructor

Question

I have a structure which I create a custom constructor to initialize the members to 0's. I've seen in older compilers that when in release mode, without doing a memset to 0, the values are not initialized.

I now want to use this structure in a union, but get errors because it has a non-trivial constructor.

So, question 1. Does the default compiler implemented constructor guarantee that all members of a structure will be null initialized? The non-trivial constructor just does a memset of all the members to '0' to ensure a clean structure.

Question 2: If a constructor must be specified on the base structure, how can a union be implemented to contain that element and ensure a 0 initialized base element?

David Rodríguez - dribeas · Accepted Answer · 2008-11-26T19:44:57.650

Question 1: Default constructors do initialize POD members to 0 according to the C++ standard. See the quoted text below.

Question 2: If a constructor must be specified in a base class, then that class cannot be part of a union.

Finally, you can provide a constructor for your union:

union U 
{
   A a;
   B b;

   U() { memset( this, 0, sizeof( U ) ); }
};

For Q1:

From C++03, 12.1 Constructors, pg 190

The implicitly-defined default constructor performs the set of initializations of the class that would be performed by a user-written default constructor for that class with an empty mem-initializer-list (12.6.2) and an empty function body.

From C++03, 8.5 Initializers, pg 145

To default-initialize an object of type T means:

if T is a non-POD class type (clause 9), the default constructor for T is called (and the initialization is ill-formed if T has no accessible default constructor);
if T is an array type, each element is default-initialized;
otherwise, the object is zero-initialized.

To zero-initialize an object of type T means:

if T is a scalar type (3.9), the object is set to the value of 0 (zero) converted to T;
if T is a non-union class type, each non static data member and each base-class subobject is zero-initialized;
if T is a union type, the object’s first named data member is zero-initialized;
if T is an array type, each element is zero-initialized;
if T is a reference type, no initialization is performed.

For Q2:

From C++03, 12.1 Constructors, pg 190

A constructor is trivial if it is an implicitly-declared default constructor and if:

its class has no virtual functions (10.3) and no virtual base classes (10.1), and
all the direct base classes of its class have trivial constructors, and
for all the nonstatic data members of its class that are of class type (or array thereof), each such class has a trivial constructor

From C++03, 9.5 Unions, pg 162

A union can have member functions (including constructors and destructors), but not virtual (10.3) functions. A union shall not have base classes. A union shall not be used as a base class.An object of a class with a non-trivial constructor (12.1), a non-trivial copy constructor (12.8), a non-trivial destructor (12.4), or a non-trivial copy assignment operator (13.5.3, 12.8) cannot be a member of a union, nor can an array of such objects

What's missing here is that despite the name, default constructors do not default-initialize POD members. 12.6.2/4 says what happens when a member is not mentioned in an initializer list, and by your quote from 12.1, this applies to implicit ctors. It says, "If the entity is a non-static data member of ... class type ... and the entity class is a non-POD class, the entity is default-initialized ... otherwise, the entity is not initialized". So, POD data members are not initialized by the implicitly-generated constructor. Non-POD data members are default-initialized. — Steve Jessop, Dec 22 '09 at 02:25

score 43 · Answer 2 · edited Feb 01 '16 at 05:32

43

Things changed for the better in C++11.

You can now legally do this, as described by Stroustrup himself (I reached that link from the Wikipedia article on C++11).

The example on Wikipedia is as follows:

#include <new> // Required for placement 'new'.

struct Point {
    Point() {}
    Point(int x, int y): x_(x), y_(y) {}
    int x_, y_;
};

union U {
    int z;
    double w;
    Point p; // Illegal in C++03; legal in C++11.
    U() {new(&p) Point();} // Due to the Point member, a constructor
                           // definition is now *required*.
};

Stroustrup goes into a little more detail.

edited Feb 01 '16 at 05:32

Rabbid76

202,892
27
131
174

answered Oct 22 '15 at 20:20

dan-man

2,949
2
25
44

1

Technically your advice a constructor is required is not correct. Like any other member it is only required if it is used. Raw storage could be initialised to either an int, double, or Point, and then a pointer to U could be used to access it (followed by the appropriate field name). Example uses include interpreting a stream of objects, emulating a stack, or simply interpreting a value allocated on the heap. – Yttrill May 19 '16 at 13:20
@Yttrill while it's true that you don't need to define a constructor unless you construct an instance of `U`, I don't believe it's legal to use a `U *` to access any member of the union if no union object has been constructed. [basic.life] _Before the lifetime of an object has started but after the storage which the object will occupy has been allocated [...] any pointer that refers to the storage location [...] may be dereferenced but [...]. The program has undefined behavior if [...] the pointer is used to access a non-static data member or call a non-static member function of the object_ – davmac Nov 11 '16 at 15:42
@davmac: its tricky. Certain aliases allow things you wouldn't expect from a naive reading, esp considering the strict aliasing rules. The C committee also made a mistake restricting the layout of integral types, allows one to prove things are valid that otherwise wouldn't be. If you combine these rules you end up with contradictions in the standards. Potentially legal aliases include integers and unsigned integers of the same size with a shared common value, uintptr_t and any pointer, unsigned char and anything at all, and any suitably large unsigned integer and any store, initialised or no. – Yttrill Dec 04 '16 at 03:50
@Yttrill layout alone does not allow aliasing where it is otherwise invalid. Using a union pointer to access a field member is not just accessing a member; the union object is also accessed, and that is what is problematic in this case; if there's no suitable constructor available, then it must be illegal as per the paragraph quoted above. – davmac Dec 04 '16 at 10:39
@Yttrill (or rather, when there is no union object, access via the union member operator is access via the union type, and that cannot alias arbitrarily with its member types, at least according to the GCC developers' standpoint and I think various others). – davmac Dec 04 '16 at 12:03
I think the difficulty is that the union doesn't need a constructor. You can in some cases memcpy an already constructed value into a component OR if there is a component of type X in union U, it can be constructed externally. There is no requirement in C++11 that a constructor for U exist, unless one is required by usage. I know about this rule since I fought for it in C++89 and failed. BTW: the GCC developers broke heaps of previously working code, and denied others previously utilised optimisations (my system had to turn off strict aliasing). – Yttrill Dec 06 '16 at 06:19
@davmac: BTW you may find it fun to look at the use of a C++11 union in https://github.com/felix-lang/addrstore/blob/master/node16.hpp which I hope is well defined (compiles and runs but i can't be sure). the need in that example arose because of a pair (pointer, iterator) where if the pointer is NULL the iterator is undefined. But a constructor is still be required and I don't have one so I have to make a union of two structs, one with and one without the iterator, so i don't have to construct the iterator. – Yttrill Dec 06 '16 at 06:31
BTW: there are times when the failure of the C++ Standard to ensure behaviour is irrelevant if you have other knowledge which does ensure it. There's no definition in C++ of conforming program. For example I have a garbage collector which regularly examines the machine stack and uninitialised store. If the compiler/linker ever broke that (which it could be entitled to) they'd better provide a switch to disable the optimisation which breaks it or QOI issues say I'll swap to another compiler. – Yttrill Dec 06 '16 at 06:44
@Yttrill "You can in some cases memcpy an already constructed value into a component OR if there is a component of type X in union U, it can be constructed externally" the problem is that you can't have a union object (and therefore component member) to memcpy into if you can't construct such an object. I know that in practice what you're suggesting will usually work just fine (and with some compilers might be guaranteed to work) but I think it is technically invalid, according to a strict reading of the standard. – davmac Dec 06 '16 at 11:14
@davmac: Sorry, I'm unsure which object you're mean to when you say "if you can't construct such an object". A union U is a type declaration. There is no requirement any constructors exist for the union, nor for any of the component types. You can still define the type .. in C++11 that is. The only requirement for constructors to exist is if you try to actually use one. A union without any constructors is perfectly good for address calculations. [see next] – Yttrill Dec 07 '16 at 13:13
For example, in C++11 you don't need to do the following (because of other facilities) but you can declare union of any types for the purposes of calculating the maximum size and alignment of the types. The union never gets constructed. Even the types involved may never be constructed. In C++89, you were not able to do this, and, there was no feature to calculate alignment either. – Yttrill Dec 07 '16 at 13:16
@davmac: you might also note your language "technically invalid" is meaningless nonsense (not meaning to be offensive but technically accurate). In C++ a program can have the property "ill formed" which means that a conforming compiler must issue a diagnostic error message. That doesn't mean that the program doesn't have deterministic semantics! Weird, but true. There are also phrases in the Standard "XXX is undefined behaviour". Most people have no idea what that means. It means absolutely nothing at all! [next] – Yttrill Dec 07 '16 at 13:20
In fact "undefined behaviour" is written as a cross-check for the Standards committee itself. It has no other purpose. Its there so someone can say "but X says this is defined and Y says undefined, please clarify!". There are in fact many case where the Standard says undefined, but in fact other rules, perhaps including implementation defined rules, in fact allow one to deduce the behaviour. [next] – Yttrill Dec 07 '16 at 13:26
For example, if an unsigned int is 4 bytes, and it has maximum value 2^32-1, then loading an uninitialised value of that type from allocated store is well defined. The value loaded may be indeterminate, but a value 0 to 2^32-1 it necessarily will have. You can deduce this fact. It has to work. The compiler cannot reject the program or fail to perform the operation. [next] – Yttrill Dec 07 '16 at 13:26
A more interesting and useful example perhaps: bit fiddling the low bits of pointers. I do that in my garbage collector. It's well defined (provided uintptr_t is defined). It's common practice in low level code. – Yttrill Dec 07 '16 at 13:28
Roughly speaking, the ctor/assign/dtor rules only apply to abstracted types not concrete ones. And sometimes you can even figure out things about them too. This is made worse if new C99 rules are applied to C++, which puts very bad restrictions on representations of integers (due to ignorance of basic mathematics in the ISO C committee). – Yttrill Dec 07 '16 at 13:31
@Yttrill, I disagree with what you are saying. Using an uninitialised variable is specifically undefined and may not necessarily generate a value, even if a bit pattern of the appropriate size can only represent a valid value; the program may just crash immediately, or otherwise behave in an unpredictable manner, _because the compiler is allowed to assume that you **will not do it** and generate code accordingly_ (and is even allowed to detect if you do it, and generate a diagnostic and terminate program execution). UB is defined _explicitly_ in the standard in a way that allows for this. – davmac Dec 07 '16 at 13:39
@Yttrill I don't think we can have a meaningful discussion about this answer without having an agreement on the ramifications of executing code that invokes specifically undefined behaviour, and anyway this is not really meant to be a forum for extended discussion, so let's agree to disagree. – davmac Dec 07 '16 at 13:43
Yes, the compiler can assume that if it has enough information. But it may not. My GC does it, and the compiler can't know because it is separately compiled. You also miss the point that undefined has no meaning. Other rules may specify behaviour in one place which is undefined in another. I know how the conformance model works. I was there :-) – Yttrill Dec 08 '16 at 14:13
1

In the example you give, which deviates from Wikipedia, there is no need for placement new. By providing a user-defined constructor for `U`, like so: `U() {}`, its member `p` will be default-initialized to an indeterminate value. Obivously, this does not make `p` the active member, so reading from any of its members would be UB for multiple reasons. Member `p` can also be activated without using placement new because its copy-constructor is trivial, as in: `U u; u.p = {1, 2};`. – 303 Jan 09 '22 at 02:09
1

@303: This seems to work: `U() : p() {}` ([GodBolt](https://godbolt.org/)). – einpoklum Apr 11 '22 at 20:51
1

@einpoklum That constructor seems to allow a use case similar to: `U u; u.p.x_ = 3;` Although, I'm not sure if that is in line with the example code you wanted to share. Anyway, looking back at my previous comment, I will delete it soon as it turned out to be partly incorrect. I had probably expected that the assignment expression `U u; u.p = {1, 2};` would be considered a form of copy-list-initialization and in turn would consider `p`'s trivial copy constructor during overload resolution. – 303 Apr 13 '22 at 14:20
1

Having taken a closer look, it appears that the code `U u; u.p = {1, 2};` actually invokes the automatically generated copy assignment operator of `p` and not any of its constructors. Since the assignment operation can only be performed on a pre-existing object, the behavior is undefined. Placement new is required to start the lifetime of an inactive union member that is non-trivial, e.g.: `U u; new (&u.p) Point{1, 2};`. – 303 Apr 13 '22 at 14:20

score 3 · Answer 3 · 2008-11-26T20:18:19.657

3

AFAIK union members may not have constructors or destructors.

Question 1: no, there's no such guarantee. Any POD-member not in the constructor's initialization list gets default-initialized, but that's with a constructor you define, and has an initializer list. If you don't define a constructor, or you define a constructor without an initializer list and empty body, POD-members will not be initialized.

Non-POD members will always be constructed via their default constructor, which if synthesized, again would not initialize POD-members. Given that union members may not have constructors, you'd pretty much be guaranteed that POD-members of structs in a union will not be initialized.

Question 2: you can always initialize structures/unions like so:

struct foo
{
    int a;
    int b;
};

union bar
{
    int a;
    foo f;
};

bar b = { 0 };

edited Nov 26 '08 at 20:18

answered Nov 26 '08 at 17:13

3

You can give the union itself a constructor that memset's itself to zero. – Greg Rogers Nov 26 '08 at 17:29
Good point! I keep forgetting about union constructors myself! – Nov 26 '08 at 17:36
1

There is no difference between a programmers default constructor with no initializer list and empty body and a compiler generated constructor. – David Rodríguez - dribeas Nov 26 '08 at 19:50
@dribeas: thank you, I didn't write that very clearly, and updated my answer accordingly. – Nov 26 '08 at 20:18
1

One difference is that the former renders the class non-POD, whereas the latter doesn't. – Steve Jessop Nov 26 '08 at 20:20

score 3 · Answer 4 · edited May 23 '17 at 12:17

3

As mentioned in Greg Rogers' comment to unwesen's post, you can give your union a constructor (and destructor if you wish):

struct foo
{
    int a;
    int b;
};

union bar
{
    bar() { memset(this, 0, sizeof(*this)); }

    int a;
    foo f;
};

edited May 23 '17 at 12:17

Community

1
1

answered Nov 26 '08 at 19:41

Adam Rosenfield

390,455
97
512
589

1

Looks like I need some education. Wouldn't memsetting the object to zero, wipe out the classes virtual table? – EvilTeach Nov 26 '08 at 21:25
6

@EvilTeach, Two things, 1) you don't have to use a vtable to implement polymorphism (except everybody does). 2) Do you see any virtual methods on foo? Or any methods at all for that matter?. Does it inherit from anything? There's no vtable without virtual methods. In fact, were foo to have virtual methods and by extension a vtable it would no longer be a POD, and therefore ineligible for membership in the union. – Logan Capaldo Jul 15 '09 at 00:05

score 0 · Answer 5 · answered Nov 26 '08 at 16:44

0

Can you do something like this?

class Outer
{
public:
    Outer()
    {
        memset(&inner_, 0, sizeof(inner_));
    }
private:
    union Inner
    {
        int qty_;
        double price_;
    } inner_;
};

...or maybe something like this?

union MyUnion
{
    int qty_;
    double price_;
};

void someFunction()
{
    MyUnion u = {0};
}

answered Nov 26 '08 at 16:44

John Dibling

99,718
31
186
324

We had considered that, but the structure we attempted to put in the union is in use in other parts of the code, so removing the constructor(assuming the compiler treats the struct as POD and doesn't intialize all elements to 0) could break code which depends on that. – Superpolock Nov 26 '08 at 17:08

Hari · Answer 6 · 2023-05-25T16:57:42.010

This is an interesting question and there is a lot of useful information in the other answers. Further, it will be useful to know the effect of specifying the default constructor via the =default syntax.

For a class that is a member of a union, such a "defaulted" default constructor is preferable to a user defined default constructor with no initialization list and an empty body. Note: In case, the user defined default constructor is non-trivial like calling memset etc, then dan-man's answer shows what needs to be done (even though the e.g. defines the default constructor has having no initialization list and an empty body).

Regarding question 1, a "defaulted" default constructor will highlight the difference between default and value initialization.

For a class called C, if the default constructor is explicitly defined by the user as C() {} (i.e., with an empty body and no initialization list) then it will lead to default initialization when an object is created this way: C c_obj{};. However, if the default constructor is specified as C()=default; then C c_obj{}; leads to value initialization of c_obj.

Regarding question 2, dan-man's answer is very useful. It will get simplified as follows with a "defaulted" default constructor,

#include <new> // Required for placement 'new'.

struct Point {
    Point()=default; // not `Point() {};`
    Point(int x, int y): x_(x), y_(y) {}
    int x_, y_;
};

union U {
    int z;
    double w;
    Point p;
    // No need to specify a default constructor.
    // It is needed with `Point() {};` which is considered 
    // as a user defined default constructor.
};

int main() {
...
U u; // implicitly generated default constructor of U is called.

new(&u.p)Point(); // activate the Point member of U
                  // using placement new.
...
}

score -3 · Answer 7 · answered Nov 26 '08 at 17:51

-3

You'll have to wait for C++0x to be supported by compilers to get this. Until then, sorry.

answered Nov 26 '08 at 17:51

Michel

1,456
11
16

Initializing a union with a non-trivial constructor

7 Answers7

Linked