5

Let's start immediately with the example code:

#include <stdio.h>
#include <stdbool.h>

typedef union {
    bool b;
    int i;
} boolIntUnion;

int main(void) {
    boolIntUnion val;

    val.i = 0;
    printf("i = %d; %s\n", val.i, ((val.b) ? "true" : "false"));
    
    val.i = 1;
    printf("i = %d; %s\n", val.i, ((val.b) ? "true" : "false"));
    
    val.i = 2;
    printf("i = %d; %s\n", val.i, ((val.b) ? "true" : "false"));
}

My question now if using the union as you can see here is some sort of undefined behaviour. What's of interest for me is the third case, where I'm setting val.i to 2, but then use the boolean from the union via val.b as a condition for my printed string. In MSVC 19 and gcc, I get the expected behaviour of the 2 in the union resulting in a value of true from val.b, but with Clang (and Clang-cl), I get false. I'm suspecting a compiler error, because if I look at the assembly output of Clang, I see test cl, 1, whereas MSVC and gcc give me test eax, eax and test al, al respectively. I tested this on Godbolt with all three compilers including execution output. Clang in the middle is where you can see the differing behaviour. [1]

Now, I'm not sure if using a union in this way is undefined behaviour under any C or C++ standard. That's basically my question now. If this is valid code, which I assume, I'd file a bug with Clang / LLVM. Just wanted to be sure beforehand. I tested this with both Clang and Clang++, same behaviour.

[1] https://godbolt.org/z/c95eWa9fq

Paul Sanders
  • 24,133
  • 4
  • 26
  • 48
0xLeon
  • 63
  • 4
  • 3
    Type punning with a `union` is UB in C++. I'm not sure if the same is true in C. – Paul Sanders Oct 23 '22 at 11:27
  • 1
    Do not tag both C and C++ except when asking about differences or interactions between the two languages. Since the Compiler Explorer link shows C++ and the code is C++ (`bool` is not defined in `C` when `` is not included), I deleted the C tag. – Eric Postpischil Oct 23 '22 at 11:35
  • 4
    @PaulSanders Thanks for that comment. The term »type punning« was not known to me and already sent me down a rabbit hole of new information, which is already helpful. Especially this other Stack Overflow [answer](https://stackoverflow.com/a/31080901/1128707) to a similar question. – 0xLeon Oct 23 '22 at 11:37
  • 3
    @EricPostpischil Well, this is exactly what I'm asking for. And as the previous comment and the other answer I found seem to suggest, there's a difference in consideration whether type punning is undefined behaviour in C and C++. – 0xLeon Oct 23 '22 at 11:40
  • @0xLeon: There is no actual question in your post, except the title contains an incomplete sentence with a question mark and does not mention C or C++. When posting to Stack Overflow, you should ask a specific explicit question, such as “Is using an inactive union member undefined behavior in C++?” or “Is there a difference between C and C++ in using a union member other than the last-stored member?” – Eric Postpischil Oct 23 '22 at 11:42
  • 1
    @EricPostpischil Clarified the title question now and corrected the Godbolt link to contain the code I posted here from the get go, which was valid C and C++. Sorry for the incorrect link. – 0xLeon Oct 23 '22 at 11:50
  • There seems to be no rule in C++! true can be stored as 5 and false can be stored as 75 by a valid C++ implementation. I am not sure, but it would probably be even UB to read raw memory with another than the 'official' two values for an implementation as a bool type. So do not use bool to exchange data with other programs or PCs or for storing to disk in a defined file format. There is a difference between reinterpretation (the UB prone reinterpret_cast and the safer bitcast) and conversion (e.g. implicit or static_cast) between bool and an integral type! Both have different rules. – Sebastian Oct 23 '22 at 11:53
  • The effects of conversion between an integral type and bool are more strictly defined by the standard than the memory representation of bool. – Sebastian Oct 23 '22 at 11:56

1 Answers1

5

In C++, using a union member other than the last-stored member is undefined behavior, with an exception for structures with common initial layouts.

C++ 2020 draft N4849 [class.union] (11.5) 2 says “… At most one of the non-static data members of an object of union type can be active at any time…” with a note there is an exception for inspecting the members in structures in the union that have a common initial sequence. [basic.life] (6.7.3) 1.5 says the lifetime of an object o ends when (among other possibilities) “the storage which the object occupies … is reused by an object that is not nested within o (6.7.2)”. Thus, after val.i = …;, the lifetime of val.b has ended (if it began), and the behavior of accessing val.b is not defined by the C++ standard. Any output or other behavior from the program or compiler is allowed by the C++ standard.

In C, accessing a union member other than the last-stored member results in reinterpreting the applicable bytes of the union in the new type, per C 2018 note 99 (speaking about 6.5.2.3 3, which says that accessing a union with . or -> provides the value of the named member, without exception for whether it was the last-stored member or not).

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Thank you for that detailed answer. So you'd confirm this is undefined behaviour in C++, but defined behaviour in C. And for the latter, a `val.i = 42` should read back `val.b` as `true`? I updated my [Godbolt](https://godbolt.org/z/hKTGK14eq) to explicitly compile as C with all three compilers, which would confirm a bug for at least C compilation with Clang. – 0xLeon Oct 23 '22 at 12:08
  • See https://stackoverflow.com/questions/71523323/c-is-reading-a-bool-after-setting-it-with-memset-undefined-implementation-de and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88662 and https://stackoverflow.com/questions/53979781/what-can-be-assumed-about-the-representation-of-true – Sebastian Oct 23 '22 at 12:17
  • @Sebastian Yes, I'm aware of that. Maybe I should stick with my example from the initial question. I'm not reading `val.b` back to get its value, but rather use `val.b` as condition immediately. Clang still generates a `test cl, 1` instruction, which would only result in the desired behaviour if the lowest bit of `val.i` is set. Which would make this even more confusing, because now, all even values set at `val.i` would result in `false`-ish behaviour, but odd values result in `true`-ish behaviour. – 0xLeon Oct 23 '22 at 12:21
  • @Sebastian- Updated the [Godbolt](https://godbolt.org/z/3Yno35onc) test, it's exactly as I assumed. – 0xLeon Oct 23 '22 at 12:23
  • @0xLeon I put in two other interesting links into the previous comment. Especially the gcc link clearly says, it is undocumented behaviour also in C on purpose and you can only really rely on 0 being false and true being always stored as another value, which is the same for one compiled program (or one ABI?). (true BTW should not be generated after UB, e.g. assigning one wrongly created bool variable to a new one does not solve the issue). – Sebastian Oct 23 '22 at 12:28
  • 2
    @0xLeon: Re “And for the latter, a `val.i = 42` should read back `val.b` as true?”: No. The C standard does not specify how many bytes are used for a `_Bool`. If it is one byte, the standard does not specify where that byte is compared to the bytes used for an `int`. And it does not specify what happens if the bits inside the bytes used to represent a `_Bool` are anything other than the bits for the values 0 or 1. You should not reinterpret an `int` as a `_Bool` expecting to get a reliable value. To test whether an `int` is zero or not, use `val.i` or `val.i != 0`. – Eric Postpischil Oct 23 '22 at 14:47
  • 1
    @EricPostpischil Thanks, then I think we finally narrowed this down to a bug in the code base I'm working with and the differing behaviour of Clang was just highlight existing undefined behaviour. – 0xLeon Oct 23 '22 at 15:31