6

This program returns 0 in my machine:

#include <stdbool.h>

union U {
    _Bool b;
    char c;
};

int main(void) {
    union U u;
    u.c = 3;
    _Bool b = u.b;
    if (b == true) {
        return 0;
    } else {
        return 1;
    }
}

AFAICT, _Bool is an integer type that can at least store 0 and 1, and true is the integral constant 1. On my machine, _Bool has a sizeof(_Bool) == 1, and CHAR_BITS == 8, which means that _Bool has 256 representations.

I can't find much in the C standard about the trap representations of _Bool, and I can't find whether creating a _Bool with a representation different from 0 or 1 (on implementations that support more than two representations) is ok, and if it is ok, whether those representations denote true or false.

What I can find in the standard is what happens when a _Bool is compared with an integer, the integer is converted to the 0 representation if it has value 0, and to the 1 representation if it has a value different than zero, such that the snippet above ends up comparing two _Bools with different representations: _Bool[3] == _Bool[1].

I can't find much in the C standard about what the result of such a comparison is. Since _Bool is an integer type, I'd expect the rules for integers to apply, such that the equality comparison only returns true if the representations are equal, which is not the case here.

Since on my platform this program returns 0, it would appear that this rule is not applying here.

Why does this code behave like this ? (i.e. what am I missing? Which representations of _Bool are trap representations and which ones aren't? How many representations can represent true and false ? What role do padding bits play into this? etc. )

What can portable C programs assume about the representation of _Bool ?

gnzlbg
  • 7,135
  • 5
  • 53
  • 106
  • 1
    A union will cause both c and b to occupy the same area of memory, assigning 3 to char means the value of b will also be 3, true if NOT false, so it should equate to -1. – SPlatten Dec 30 '18 at 17:27
  • 1
    I could be wrong, but your program smells like undefined behavior to me – Basile Starynkevitch Dec 30 '18 at 17:32
  • 1
    In order to avoid integer overflows, any assignments to a _Bool that are not 0 (false) are stored as 1 (true). That being said, your code seems to be creating it's own confusion here, since the code says `if (true) return false`, and then you question why you are getting a `false` response. However, I believe that @interjay actually has the correct answer, and you have an undefined behavior due to your union causing the `_Bool` to return the lowest bit. This seems to imply that the Union is only useful to identify if a value is odd or even. – Claies Dec 30 '18 at 17:39
  • 1
    No, equality comparisons return true if the **values** are equal. Representations are irrelevant for this purpose – n. m. could be an AI Dec 30 '18 at 17:44
  • 1
    [My answer to Is it safe to memset bool to 0?](https://stackoverflow.com/a/33398698/1708801) has a lot of potentially useful background information. – Shafik Yaghmour Dec 30 '18 at 18:03
  • @ShafikYaghmour That question is for C++ though. While some of it is relevant, in C things are simpler because `_Bool` is a simple integer type that can hold the values 0 (false) and 1 (true). In C++ all you are guaranteed is that `true` converts to 1 and `false` to 0. – interjay Dec 30 '18 at 18:10
  • 1
    Your sample program is *unspecified* behavior as per ISO/IEC 9899:TC2 Annex J1. So the question is if you are reasoning about well-definedness or about implementation details? – Kamajii Dec 30 '18 at 19:10
  • 1
    This is clearly an UB (as also @BasileStarynkevitch said) due to breaking of **strict aliasing rules**, and as UB you can expect everything. The standard doesn't clearly defines how a `_Bool` object is instantiated in memory (it says "While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit.", not "The number of bits in a _Bool object **shall be CHAR_BIT**, the width of a _Bool **shall be just 1 bit**."), so no assumption can be made, and the behavior is **compiler dependent**. Your code **is not portable**. – Frankie_C Dec 30 '18 at 19:23
  • @Frankie_C Type-punning by reading a different union field is allowed by the C11 standard and not UB. – interjay Dec 30 '18 at 19:55
  • @interjay The point is that the standard doesn't specify exactly how a Boolean should be represented, so any assumption could be wrong on different compiler or system. Then if the exact representation is unknown, using a union, with what you **suppose** to be a **compatible type**, you're trying to access same memory from different objects reference, which is a **strict alias rule breaking**. Because the OP question is about **portability**, and we have a possibly wrong assumption, the code **is not portable**. – Frankie_C Dec 31 '18 at 07:45
  • @Claies "since the code says if (true) return false, and then you question why you are getting a false response." FWIW the question isn't why the response is false, but why the first expression in the if is `true`. – gnzlbg Dec 31 '18 at 09:30
  • @Kamajii could you be more specific about which part of Annex J1 makes this sample program have unspecified behavior ? – gnzlbg Dec 31 '18 at 09:33
  • @Frankie_C The standard does specify certain things about the representation of `_Bool`. I used here a single `char` in the `union` for convenience since in all platforms I tested `sizeof(_Bool) == sizeof(char)`, but I could have used an array of`sizeof(_Bool)` `chars` in the `union` instead. Also, using a `char` to read / write bytes into the storage of other objects is not a strict aliasing violation because the rules have that as an exception. I could have used a `char*` to write to the `_Bool` representation instead of an union. – gnzlbg Dec 31 '18 at 09:39
  • @Frankie_C Again, there is no undefined behavior and no violation of strict aliasing because the C standard explicitly allows reading from a different union member than the one which was assigned. The fact that the representation is implementation-defined is correct, but not relevant to UB. – interjay Dec 31 '18 at 10:45
  • @gnzlbg from the aforementioned: "The following are unspecified: [...] The value of a union member other than the last one stored into." Especially note that it is *not undefined* but *unspecified*. This makes a difference, as explained in the section on Terms, definitions and symbols. – Kamajii Dec 31 '18 at 12:22

1 Answers1

7

Footnote 122 in the C11 standard says:

While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit.

So on a compiler where _Bool has only one value bit, only one of the bits of the char will have effect when you read it from memory as a _Bool. The other bits are padding bits which are ignored.

When I test your code with GCC, the _Bool member gets a value of 1 when assigning an odd number to u.c and 0 when assigning an even number, suggesting that it only looks at the lowest bit.

Note that the above is true only for type-punning. If you instead convert (implicit or explicit cast) a char to a _Bool, the value will be 1 if the char was nonzero.

interjay
  • 107,303
  • 21
  • 270
  • 254
  • The question is whether `_Bool` has only 1 bit of values in GCC. If it does, then all other representations are trap representations and the behavior of my example is undefined. If it does not, then `_Bool` has many "true" and "false" representations, instead of just 2. – gnzlbg Dec 31 '18 at 09:16
  • 2
    @gnzlbg It does have only 1 bit of value in GCC (you can check this by trying to create a bitfield with a `_Bool b : 2` field, which fails). But the other representations are not necessarily trap representations: The unused bits are padding bits, which can have any value. So the 0 and 1 values have many possible representations. It's possible for the compiler to define certain combinations of padding bits as trap representations, but it doesn't look like GCC does so. – interjay Dec 31 '18 at 10:43
  • Thanks, that makes sense. – gnzlbg Dec 31 '18 at 12:23
  • FYI I just filled https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88662 because it does look like GCC assumes that _Bool representations are either 0x0 or 0x1. – gnzlbg Jan 02 '19 at 13:54