Could someone explain this C++ union example?

Question

I found this code on cppreference.com. It's the strangest C++ I've seen, and I have a few questions about it:

union S
{
    std::string str;
    std::vector<int> vec;
    ~S() {}  
};          

int main()
{
    S s = { "Hello, world" };
    // at this point, reading from s.vec is undefined behavior
    std::cout << "s.str = " << s.str << '\n';
    s.str.~basic_string<char>();
    new (&s.vec) std::vector<int>;
    // now, s.vec is the active member of the union
    s.vec.push_back(10);
    std::cout << s.vec.size() << '\n';
    s.vec.~vector<int>();
}

I want to make sure I've got a few things right.

The union forces you to initialise one of the union members by deleting the default constructors, in this case he initialised the string with Hello World.
After he's initialised the string, the vector technically doesn't exist yet? I can access it, but it isn't constructed yet?
He explicitly destroys the string object by calling its destructor. In this case when S goes out of scope, will the ~S() destructor be called? If so, on which object? If he doesn't call the destructor explicitly on the string is it a memory leak? I'm leaning towards no because strings clean themselves up, but for unions I don't know. He calls the destructor for both the string and vector himself, so the ~S() destructor seems useless, but when I delete it my compiler won't let me compile it.
This is the first time I've seen someone use the new operator to place an object on the stack. In this case is this the only way now that the vector can be used?
When you use placement new as he does with the vector, you're not supposed to call delete on it because new memory hasn't been allocated. Usually if you placement new on the heap you have to free() the memory to avoid a leak, but in this case what happens if he let's the vector and union go out of scope without calling the destructor?

I find this really confusing.

Gotta say... Never seen a `union` with a destructor before. Good, bad, I dunno. Just never seen one. — user4581301, Sep 21 '17 at 17:21
In practice your class should know which union member it uses, e.g. be a [tagged union](https://en.wikipedia.org/wiki/Tagged_union); see [std::variant](http://en.cppreference.com/w/cpp/utility/variant); read all of [union reference](http://en.cppreference.com/w/cpp/language/union) page, notably the last tagged union example — Basile Starynkevitch, Sep 21 '17 at 17:21
One question per post please. And you need to read a [good c++ book](https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list) instead of asking random internet people — Passer By, Sep 21 '17 at 17:21
@passerby this is all so tightly bound it's pretty much one question looked at from different sides. — user4581301, Sep 21 '17 at 17:25
(2) accessing the non-active members(s) of a union is undefined behaviour. The current active member is the last member of the union that was written to. — Richard Critten, Sep 21 '17 at 17:28
@user4581301 I realize that, which is more reason a book is more suited. The answer would have to essentially be a complete specification of unions — Passer By, Sep 21 '17 at 17:28
Also note the semantics of `new` can be cleanly separated out. It shouldn't be in the question — Passer By, Sep 21 '17 at 17:30
@RichardCritten "_accessing the non-active members(s) of a union is undefined behaviour_" Except for the purpose of accessing common initial member, right? — curiousguy, Jan 17 '19 at 16:59

Daniel H · Answer 1 · 2017-09-21T18:00:22.073

8

Yes, exactly.
Because the vector and the string use the same underlying storage (which is how unions work), and that storage currently contains a string, there is no place for a vertor to be and trying to access it would be undefined. It’s not that it hasn’t been constructed yet; it’s that it cannot be constructed because there’s a string in the way.
Whenever an S goes out of scope, its destructor is called. In this case, that’s the union’s destructor, which was explicitly defined to do nothing (because the union can’t know which member is active, so it can’t actually do what it’s supposed to). Because the union cannot know which of its members is active, if you don’t explicitly call the destructor of the string, it cannot know there was a string there and the string will not be cleaned up. The compiler makes you write your own destructor when there are union members with non-trivial destructors, because it can’t know how to clean that up and hopes that you do; in this example you don’t know how to clean it up either, so you do nothing in the union’s destructor and make the person who uses S call the destructor on the correct element manually.
This is called “placement new”, and is the typical way to construct an object in an existing memory location instead of allocating a new one. There are uses for it besides unions, but I believe that it’s the only way to get a vector into this union without using undefined behavior.
As addressed in part 3), when s goes out of scope, it doesn’t know if it holds a string or a vector. The ~S destructor does nothing, so you need to destroy the vector with its own destructor, like with the string.

To see why the union can’t automatically know which destructor to call, consider this alternate function:

int maybe_string() {
    S s = {"Hello, world"};
    bool b;
    std::cin >> b;
    if (b) {
        s.str.~basic_string<char>();
        new (&s.vec) std::vector<int>;
    }
    b = false;
    // Now there is no more information in the program for what destructor to call.
}

At the end of the function, the compiler has no way to know if s contains a string or a vector. If you don’t call a destructor manually (assuming you had a way to tell, which I don’t think you do here), it will have to play it safe and not destroy either member. Instead of having complicated rules about when the compiler would be able to destroy the active member and when it wouldn’t destroy anything, the creators of C++ decided to keep things simple and just never destroy the active member of a union automatically and instead force the programmer to do it manually.

edited Sep 21 '17 at 18:00

answered Sep 21 '17 at 17:30

Daniel H

7,223
2
26
41

But I don't think that the string will leak if S goes out of scope if he doesn't explicitly call the destructor. Sure the ~S destructor doesn't do anything, but a string object in a normal class will clean itself up, right? And should be the case with the vector too I think? – Zebrafish Sep 21 '17 at 17:35
2

@Zebrafish, In a normal class, you know which object lives in the space. In the union, both objects occupy the same space and only one is actually there. The compiler can't know which one to destroy, so no, it's not okay to leave out the destructor call of either object in the union. – chris Sep 21 '17 at 17:43
Oh I see now. So it forces you to have a destructor, except in this case it doesn't do anything particularly. But I suppose you could keep a record of which member is active and call its destructor based on that. – Zebrafish Sep 21 '17 at 17:49
@Zebrafish I added an example where there is no way the compiler could know what destructor to call, so you can see why it makes the programmer do it instead. – Daniel H Sep 21 '17 at 18:00
Thanks. I guess that's a really rare case of a resource leaking on the stack, isn't it? – Zebrafish Sep 21 '17 at 18:08
1

@Zebrafish there is no leak on the stack (automatic memory would be reclaimed), but `string` object is very likely to do _dynamic_ allocation. Without proper disposal, program would not know that it is supposed to be deleted and will leak it. – Revolver_Ocelot Sep 21 '17 at 18:18
@Zebrafish Any time you work around the type system, such as with unions or pointer casting, is a time that you can leak resources. Most of those are also times you flirt with UB; you need to be careful of both issues around unions, [storage reuse](http://en.cppreference.com/w/cpp/language/lifetime#Storage_reuse), and [uninitialized storage](http://en.cppreference.com/w/cpp/types/aligned_storage). In most code the three of these combined are less common than heap usage, which is why you usually think of resource leaks as related to heap usage, but all of these have similar issues. – Daniel H Sep 21 '17 at 18:20
@Revolver_Ocelot Yes, stack memory isn’t leaked, but an object is created on the stack and then not cleaned up properly, causing a resource leak from something which was on the stack. Heap memory is not the only resource that can leak this way; if there were an `fstream` as a union member then you might leak a file descriptor from the stack. Usually it’s much easier to not call a destructor on a heap object. – Daniel H Sep 21 '17 at 18:23
@DanielH Originally I had "not establishing invariants" instead of "leaking memory", with `shared_ptr` failing to properly decrement use count and failing to meet its designed goal as an example, but comment got too big, so I simplified it. – Revolver_Ocelot Sep 21 '17 at 18:28
@Daniel H I'm having trouble following this. A string usually does a heap allocation, so it would be a leak on the heap, but in the case of say struct ABigClass{ int buffer [1000] ; } would you leak a thousand bytes on the stack? Revolver said it would be reclaimed automatically. – Zebrafish Sep 21 '17 at 18:40
@Zebrafish int array is a _TriviallyDestructible_ type. You are allowed to not call destructors on them, because they do not have any resourses to free, or invariants to establish on destruction. You do not leak memory, or cause UB if you let union with active _TriviallyDestructible_ type to go out of scope. – Revolver_Ocelot Sep 21 '17 at 18:45
@Zebrafish Because of the way a stack works, the memory for a stack object *itself* is always reclaimed. In the case of a `std::string`, this is usually `3*sizeof(char*)` bytes, because a string usually contains a pointer to the start, a pointer to the last character, and a pointer to the end of allocated memory (this is a simplification because of short-string optimization and shared strings, but it’s approximately correct). These 24-or-however-many bytes are reclaimed, but the actual characters in the string are leaked. In your example, the 1000 `int`s are on the stack, and not leaked. – Daniel H Sep 21 '17 at 18:47

NathanOliver · Answer 2 · 2017-09-21T17:50:48.957

The union forces you to initialise one of the union members by deleting the default constructors, in this case he initialised the string with Hello World.

Correct

After he's initialised the string, the vector technically doesn't exist yet? I can access it, but it isn't constructed yet?

Well, even though it is accessible doesn't mean you can access. Since it is not the active item accessing it is undefined behavior. The reason for this is its lifetime has not begun because its constructor has not yet been called.

will the ~S() destructor be called?

No, s will only be destroyed when it goes out of scope.

If he doesn't call the destructor explicitly on the string is it a memory leak?

Yes, but what it really is though is undefined behavior. You can't change members without destroying the active one since the destructor is not trivial. If you don't destroy the string before you create the vector then you lose the state of the string which includes the memory it was holding (if it held any - see small string optimizations on how it could not).

so the ~S() destructor seems useless, but when I delete it my compiler won't let me compile it.

It is useless as you say but it really all you can do. The union has to have a destructor and the compiler provided one is deleted because std::string and std::vector have non trivial destructors.

In this case is this the only way now that the vector can be used?

Yes. You have to use placement new in order for the object to be constructed. If you didn't and tried to do something like

s.vec = std::vector<int>{};

Then you would be assigning to an object that was never constructed which is undefined behavior.

vector and union go out of scope without calling the destructor?

Well, if they didn't manually destroy the vector then you would leak what the vector holds as nothing would be destroyed. As long as you destroy the active member before the union goes out of scope then you are fine.

Could someone explain this C++ union example?

2 Answers2

Linked