7

C++'s unions are more restrictive than those of C, because they introduce the concept of an "active member" (the one last assigned to) as the only one safe to access. The way I see it, this behavior of unions is a net negative. Can someone please explain what is gained by having this restriction?

timrau
  • 22,578
  • 4
  • 51
  • 64
debiatan
  • 111
  • 4
  • 5
    It makes constructors and destructors have a chance to behave in a sane way, which is a non-issue in C. – nwp Oct 25 '17 at 13:01
  • 1
    C++ unions also have the common-initial sequence guarantee. So I don't see an issue, unless you are doing something fishy. – StoryTeller - Unslander Monica Oct 25 '17 at 13:02
  • 1
    Are you sure that C standard allows aliasing of union members? AFAIK, the strict aliasing rule also exists in C... – Serge Ballesta Oct 25 '17 at 13:16
  • @DanielTrugman I thought the _aformentioned types_ are quite restrictive? – Passer By Oct 25 '17 at 13:25
  • @DanielTrugman: the exact same sentence exists in n4296 C++14 draft in 3.10 Lvalues and rvalues [basic.lval] §10 and 10.6 So if you think it is not allowed in C++ there are no reason for it to be allowed in C – Serge Ballesta Oct 25 '17 at 13:32
  • 1
    @PasserBy, the last quote wasn't good enough. From C11 section 6.5.2.3 §3 footnote 05: _"If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation."_ So type aliasing is allowed but susceptible to type punning. See more [here](https://stackoverflow.com/questions/25664848/unions-and-type-punning) – Daniel Trugman Oct 25 '17 at 13:32
  • @SergeBallesta C++ explicitly forbids punning through unions with active/inactive members, which C doesn't. I don't know what C does though – Passer By Oct 25 '17 at 13:34

2 Answers2

4

Short answer

In C, the union is only a question of how to interpret the data that is stored at a given location. The data is passive.

In C++, unions can have members of different classes. And class objects do not only have data, but also have a behavior. As you rely on this (accessible) behavior (and maybe can't even access the private and protected members), it must be ensured that the object remain consistent from its construction to its destruction. The notion of active member is there exactly for this purpose: ensure that the object lifecycle is consistent.

Longer explanations

Imagine the following union:

union U {
    string s;
    int i;

    // due to string, I need to define constructor and destructor
    U (string s) : s(s) { cout << "s is active"<<endl;}
    U (int i) : i(i) { cout << "i is active"<<endl;}
    U() : s() { cout << "s is active by default" <<endl; }
    ~U() { cout << "delete... but what ?"<<endl; }
};

Now suppose that I initialize it:

U u("hello"); 

The active member is s at that moment. I can now use this active memeber without risk:

u.s += ", world";  
cout << u.s <<endl;

Before changing the active member, I have to be sure that the lifetime of the member is ended (requirement as per C++ standard). If I forget this, and for example use another member:

u.i=0;  // ouch!!! this is not the active member : what happens to the string ?  

I have then undefined behavior (in practice here, s is now corrupted and it is no longer possible to recover the memory in which the characters were stored). You could also imagine the opposite. Suppose the active member would be i, and I want now to use the string:

u.s="goodbye";  // thinks that s is an active valid string which is no longer the case 

Here, the compiler assulmes that I know that s is the active member. But as s is not a properly initializeed string, performing a copy operator will also result in undefined behavior.

Demo of what you should not do

How to do it right ?

The standard explains it:

If M has a non-trivial destructor and N has a non-trivial constructor (for instance, if they declare or inherit virtual functions), the active member of u can be safely switched from m to n using the destructor and placement new-expression as follows:

u.m.~M();
new (&u.n) N;

So in our nasty example, the following would work:

u.s.~string(); // properly end the life of s
u.i=0;  // this is now the active member   
           // no need to end life of an int, as it has a trivial destructor 
new (&u.s) string("goodbye");  // placement new  
cout << u.s <<endl;    

Demo of how to (almost) do it right

Christophe
  • 68,716
  • 7
  • 72
  • 138
  • I'm not sure I follow this line of reasoning. C++ objects inside unions can't have non-trivial constructors, destructors or virtual methods, so I would expect the behavior of such a union to be similar to that of a C union (mainly overlap data in memory). Calling C++ methods on those objects is not different from invoking C functions that change the contents of a regular union. I'm sure I'm missing something stilll... – debiatan Oct 25 '17 at 16:18
  • @debiatan I've edited with additional details. Hope this answers your doubts – Christophe Oct 25 '17 at 17:44
  • Thanks for expanding your explanation. I follow the logic of your argument, but your example does not address my previous comment since std:string has a non-trivial copy constructor, which the standard forbids. I can't compile it under clang++ 3.5 nor 3.8. See [this link](https://stackoverflow.com/a/7299212/3432687) for more details. – debiatan Oct 26 '17 at 05:20
  • @debiatan to which version of the standard are you referring to (if possible with section number) ? See: http://en.cppreference.com/w/cpp/language/union – Christophe Oct 26 '17 at 07:21
  • If I'm reading correctly the document that you link to (and according to recent versions of clang++ and g++), including a non-trivially-copyable type (e.g. a string) inside a struct only works from C++11 onward. That means that your example is rightfully sustained by that version of the standard. Would you say, however, that _the concept of an active member of a union_, which predates C++11, was useful in any way at the moment of its inception? (I believe this is almost a rewording of my original question, but I'll take your answer now, because it's sufficient for modern common practice). – debiatan Oct 27 '17 at 23:53
  • @debiatan in the original c++ programming language published by Stroustrup in 1986, p.154, a union is defined as a struct where all members have the same address and "only one member will have a useful value at any one time". Despite the restrictions at that time (no constructor for any member), the idea of having only one active object was already almost there. At that time union were typically used for low level code and C compatibility. But the necessity for a single active element emerged from clean use of the object model (i.e private data +working only with the public interface) – Christophe Oct 28 '17 at 09:48
  • In the "to make the nasty example work" code snippet, does there need to be a final `u.s.~string();` to destruct the now-active `u.s` "goodbye" string? – Eljay Jul 16 '23 at 13:01
  • 1
    @Eljay yes. The reason is that in this snippet, we know what member is active, and it's the string ("goodbye"), and we destroy it before the containing object is destroyed. If we would not do it, the object would be destroyed without knowing what component is active, i.e. the string would never be destroyed. Unions are low-level tools inherited from C, which work well with members that have trivial constructors/destructors. ../.. – Christophe Jul 16 '23 at 13:24
  • ../.. In real life, you would use unions of different structs that each have a first member in common. The standard makes sure that these common members can be accessed from any member of the union. The common member is then used as a hint to dertermine which is the active member, and the union destructor can then take advantage of this information to do its job correctly. In this cas we would need an explicit destructor call at the end. Unions should imho be avoided in C++ there are std::variant and likewise constructs that do this job much better. – Christophe Jul 16 '23 at 13:27
0

The fundamental reason, aside from union members of non-trivial class type, is that C’s rules for unions defeat type-based alias analysis. C tries to have it both ways here, but the possibility of accessing a union member through an ordinary pointer (or reference, in C++) makes those rules not work.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76