98

When I was reading seastar source code, I noticed that there is a union structure called tx_side which has only one member. Is this some hack to deal with a certain problem?

FYI, I paste the tx_side structure below:

union tx_side {
    tx_side() {}
    ~tx_side() {}
    void init() { new (&a) aa; }
    struct aa {
        std::deque<work_item*> pending_fifo;
    } a;
} _tx;
Muntasir
  • 798
  • 1
  • 14
  • 24
xiaoming-qxm
  • 1,738
  • 14
  • 23
  • 1
    Potential duplicate of https://stackoverflow.com/questions/26572432/is-there-any-difference-between-structure-and-union-if-we-have-only-one-member. – Max Langhof Nov 27 '19 at 09:32
  • 7
    @MaxLanghof This question and corresponding answers didn't mention about the purpose of using such union structure. – xiaoming-qxm Nov 27 '19 at 09:37
  • Have you an example for a use of this member? – n314159 Nov 27 '19 at 09:39
  • 4
    That's why I didn't actually use my binding close vote. But I'm not sure what exactly you expect from answers to your question that doesn't follow directly from the answers over there. Presumably the purpose of using `union` instead of `struct` is one or more of the differences between the two. It's a pretty obscure technique so unless the original author of that code comes along I'm not sure somebody can give you an authoritative answer which problem they're hoping to solve with this (if any). – Max Langhof Nov 27 '19 at 09:40
  • @n314159 The *seastar source code* link in the question is the example. – xiaoming-qxm Nov 27 '19 at 09:43
  • @daoliker In the linked file the member `_tx` is not used only defined. It would maybe be helpful to see an example of actual usage of this variable. – n314159 Nov 27 '19 at 09:47
  • @n314159 You can find its usage in the smp.cc or reactor.cc – xiaoming-qxm Nov 27 '19 at 09:49
  • 2
    My best guess is that union is used to either delay construction (which is somewhat pointless in this case) or prevent destruction (which leads to memory leak) of pending_fifo. But hard to say without example of usage. – Konstantin Stupnik Nov 27 '19 at 09:53
  • Two-phase initialization, C++11 edition – M.M Nov 27 '19 at 10:51
  • it's not clear where or how the `deque` is destroyed? – M.M Nov 27 '19 at 10:52

2 Answers2

96

Because tx_side is a union, tx_side() doesn't automatically initialize/construct a, and ~tx_side() doesn't automatically destruct it. This allows a fine-grained control over the lifetime of a and pending_fifo, via placement-new and manual destructor calls (a poor man's std::optional).

Here's an example:

#include <iostream>

struct A
{
    A() {std::cout << "A()\n";}
    ~A() {std::cout << "~A()\n";}
};

union B
{
    A a;
    B() {}
    ~B() {}
};

int main()
{
    B b;
}

Here, B b; prints nothing, because a is not constructed nor destructed.

If B was a struct, B() would call A(), and ~B() would call ~A(), and you wouldn't be able to prevent that.

k_ssb
  • 6,024
  • 23
  • 47
HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
  • Is the memory of object `b` fill with random bytes before I call constructor `A()`? – xiaoming-qxm Nov 27 '19 at 10:00
  • 25
    @daoliker not necessarily random, but unpredictable by you. Same as any other uninitialized variable. You can't assume it's random; for all you know it could hold the user's password that you previously asked them to type in. – user253751 Nov 27 '19 at 10:20
  • 5
    @daoliker: The previous comment is too optimistic. Random bytes would have values in the range 0-255, but if you read an uninitialized byte into an `int` you may get `0xCCCCCCCC`. Reading uninitialized data is Undefined Behavior, and what might happen is that the compiler simply discards the attempt. This is not just theory. Debian made this exact mistake, and it broke their OpenSSL implementation. They had some real random bytes, added an uninitialized variable, and the compiler said "well the result is undefined, so it might as well be zero". Zero obviously isn't random anymore. – MSalters Nov 28 '19 at 10:49
  • 1
    @MSalters: Do you have a source for this claim? Because what I can find suggests that is not what happened: it wasn't the compiler that removed it, but the developers. Honestly, I'd be amazed if any compiler writer made such an incredibly bad decision. (see https://stackoverflow.com/questions/45395435/why-is-uninitialized-memory-safe-to-use-in-openssls-random-number-generator ) – Jack Aidley Nov 28 '19 at 12:21
  • 5
    @JackAidley: Which precise claim? You do have a good link, seems I got the story inverted. OpenSSL got the logic wrong, and used an uninitialized variable in such a way that a compiler could legally assume any result. Debian correctly spotted that, but broke the fix. As for "compilers making such bad decisions"; they don't make that decision. The Undefined Behavior is the bad decision. Optimizers are designed to run on correct code. GCC for instance actively assumes no signed overflow. Assuming "no uninitialized data" is equally reasonable; it can be used to eliminate impossible code paths. – MSalters Nov 28 '19 at 12:37
  • 1
    @JackAidley I've encountered similar issues to what @MSalters mentions in my own code; I erroneously assumed an uninitialized variable would be empty, and was baffled when a subsequent `!= 0` comparison yielded true. I've since added compiler flags to treat uninitialized variables as errors to make sure I won't fall into that trap again. – Tom Lint Nov 28 '19 at 12:41
  • 1
    @MSalters: Undefined behaviour only means that it is not defined by the Standards Committee, compiler writers are typically more pragmatic than the committee and don't behave egregiously just because the committee say its UB. – Jack Aidley Nov 28 '19 at 13:14
  • @JackAidley This would be an example of clang assuming `uninitialized` contains zero: https://godbolt.org/z/s-pFbe (and unfortunately does not warn about it ...) -- correction: it actually sees it is undefined and does not even care what `foo` returns: https://godbolt.org/z/EQZmJr – chtz Nov 28 '19 at 15:45
  • @MSalters The Linux people were for a while not happy about the anal standard exegesis by the gcc team; another gotcha is aliasing. – Peter - Reinstate Monica Nov 28 '19 at 16:43
  • @MSalters: What term does the Standard use for non-portable *but correct* actions upon which it imposes no requirements? – supercat Jul 30 '21 at 21:55
  • @JackAidley: People wishing to sell compilers to people who would need to write code for them behave sanely. Compiler writers who are exempt from market pressures, however, often exploit the Standard as an excuse to deride as "broken" constructs which would be non-portable but correct if written for any commercial compiler. – supercat Jul 30 '21 at 21:58
  • @supercat: Provided it's correct, that would be _Unspecified Behavior_ as opposed to Undefined Behavior. – MSalters Jul 31 '21 at 19:57
  • @MSalters: The Standard uses the phrase "unspecified behavior" to indicate that an implementation may choose in arbitrary means from among a set of alternatives that is either explicitly given (e.g. `f()+g()` may either call f() and then g(), or call g() and then f(), but those are the only two choices) or implied (an "unspecified value" must be chosen from among the set of bit patterns a type could hold, however large or small that might be). Why would the Standard use the phrase "non-portable or erroneous" if they simply meant "erroneous"? – supercat Jul 31 '21 at 20:04
  • @MSalters: Even in cases where 99% of implementation can and do process a construct identically, the Standard may still give implementations unlimited license to deviate from such behavior if the benefits of doing so would substantially exceed the benefits of following precedent, on the presumption that people seeking to sell compilers would only do so in cases that would genuinely benefit their customers. On a two's-complement platform where neither `int` nor `unsigned` has padding bits, the behavior of -1<<1 was unambiguously defined under C89. C99 recharacterized it as UB because... – supercat Jul 31 '21 at 20:15
  • ...on other platforms it might make sense to process such shifts in ways that might raise a signal at a time other than when the shift is performed (e.g. if the compiler holds off on performing a shift until it knows whether the result will be needed). Because any action that would allow the effects of a potential useful optimization to be observable *must* be classified as UB, what had been fully defined behavior on C89 was reclassified as UB without a peep in the Rationale. Was that intended to forbid the construct on two's-complement platforms without padding bits? If so, why? – supercat Jul 31 '21 at 20:17
  • @supercat: While I was active in WG21 around that time, I never participated in WG14, so I can't answer that. – MSalters Aug 02 '21 at 07:16
  • @MSalters: Why does the Standard specify that actions characterized as Undefined Behavior may be processed "In a documented fashion characteristic of the environment" if it does not intend that implementations extend the language with such semantics when doing so would be useful? And what do you make of N1570 5.1.2.3 paragraph 9, "An implementation might define a one-to-one correspondence between abstract and actual semantics: at every sequence point, the values of the actual objects would agree with those specified by the abstract semantics." if not intended to invite such extensions? – supercat Aug 02 '21 at 13:21
  • @MSalters: Also, I'm curious what you perceive as the range of situations in which any parts of the Standard would actually have any normative authority with respect to non-trivial programs for freestanding implementations, or for freestanding implementations themselves. IMHO, the Standard's definitions of "conformance" are severely lacking, and this weakness is at the heart of most controversies regarding the Standard. – supercat Aug 02 '21 at 13:24
0

In simple words, unless explicitly assigned/initialized a value the single member union does not initialize the allocated memory. This functionality can be achieved with std:: optional in c++17.

Sitesh
  • 1,816
  • 1
  • 18
  • 25
  • 3
    That's a misleading answer. Union with only one member will have the same size as the member. This memory simply won't be initialized till the member is initialized. – Kirill Dmitrenko Dec 06 '19 at 15:51