8

I was trying to write some macros for type safe use of _Bool and then stress test my code. For evil testing purposes, I came up with this dirty hack:

_Bool b=0;
*(unsigned char*)&b = 42;

Given that _Bool is 1 byte on the implementation sizeof(_Bool)==1), I don't see how this hack violates the C standard. It shouldn't be a strict aliasing violation.

Yet when running this program through various compilers, I get problems:

#include <stdio.h>

int main(void)
{
  _Static_assert(sizeof(_Bool)==1, "_Bool is not 1 byte");

  _Bool b=0;
  *(unsigned char*)&b = 42;
  printf("%d ", b);
  printf("%d", b!=0 );

  return 0;
}

(The code relies on printf implicit default argument promotion to int)

Some versions of gcc and clang give output 42 42, others give 0 0. Even with optimizations disabled. I would have expected 42 1.

It would seem that the compilers assume that _Bool can only be 1 or 0, yet at the same time it happily prints 42 in the first case.

Q1: Why is this? Does the above code contain undefined behavior?

Q2: How reliable is sizeof(_Bool)? C17 6.5.3.4 does not mention _Bool at all.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 6.7.2.1 has interesting footnote that may be relevant: *"124) While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit."* – user694733 Sep 04 '18 at 10:19
  • @user694733 That's a non-normative foot note regarding the use of _bit-fields_. I don't see how it is relevant here. – Lundin Sep 04 '18 at 10:49
  • 1
    How can the output be `42 42`? The second printf can only print 1 or 0. – StoryTeller - Unslander Monica Sep 04 '18 at 10:53
  • @StoryTeller Indeed. But that's what I get with gcc/mingw, hence the question. Maybe a bug in the standard lib? – Lundin Sep 04 '18 at 10:55
  • 2
    [O_O I tried it](http://coliru.stacked-crooked.com/a/fa18f68fdfc0b6a6). My mind is seriously blown right now. This joins my collection of UB examples. – StoryTeller - Unslander Monica Sep 04 '18 at 10:58
  • @StoryTeller _Why_ is it UB though? – Lundin Sep 04 '18 at 11:01
  • 1
    I don't know. Though I was initially shocked by the second 42, it kinda makes sense in retrospect. Because `b != 0` for a `_Bool` can be optimized to simply `b`. I'm scratching my head still though. – StoryTeller - Unslander Monica Sep 04 '18 at 11:02
  • 1
    @RbMm Character types are an exception in strict aliasing rules. Optimizer cannot cause UB based on that here. – user694733 Sep 04 '18 at 11:04
  • 1
    In a similar example, the `_Bool` optimization combines with optimizations for transforming branches to arithmetic operations, producing strange-looking results for very natural code. The optimization of `if (b) x++;` into `x+=(the representation of )b;` confirms that Clang treats `_Bool` representations other than `0` and `1` as trap values triggering UB. https://gcc.godbolt.org/z/wPq4zq – Pascal Cuoq Sep 04 '18 at 11:25
  • @user694733 The aliasing rules are asymmetric with respect to character types. When the effective type of a datum is a non-character type, you can access that datum via a character type. But when the effective type of a datum is a character type, you can _not_ access that datum via a non-character type. – zwol Sep 04 '18 at 19:25
  • 1
    Please post the assembly code for the posted question. Then we can easily determine what the compiler was thinking – user3629249 Sep 05 '18 at 02:01
  • Interesting corner case: the whole C99 `_Bool` semantics is a hack. It would have been fine to impose all this headache on implementors if they had also added boolean and/or bit-field arrays, but, as specified, it does provide any real improvement over `enum { false, true }; typedef unsigned char _Bool;` – chqrlie Apr 06 '22 at 10:27
  • For Q2, `sizeof(_Bool)` is at least 1. C17 6.7.2.1/4 footnote 124 says: "While the number of bits in a `_Bool` object is at least `CHAR_BIT`, the width (number of sign and value bits) of a `_Bool` may be just 1 bit." (Of course, `_Bool` has no sign bit, only value bits and padding bits.) C23 final draft n3054 6.2.6.1/2 says: "The type `bool` shall have one value bit and `(sizeof(bool)*CHAR_BIT)` - 1 padding bits." (Spot the minor typographical error in the text: they used a hyphen character instead of a minus sign character.) – Ian Abbott Aug 16 '23 at 09:58

1 Answers1

9

Q1: Why is this? Does the above code contain undefined behavior?

Yes, it does. The store is valid, but subsequently reading that as a _Bool is not.

6.2.6 Representations of types

6.2.6.1 General

5 Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. [...]

Q2: How reliable is sizeof(_Bool)? C17 6.5.3.4 does not mention _Bool at all.

It will reliably tell you the number of bytes that are needed to store one _Bool. 6.5.3.4 also doesn't mention int, but you're not asking whether sizeof(int) is reliable, are you?

Community
  • 1
  • 1
  • So `_Bool a; *(unsigned char*)&a = 42; printf("%d", *(unsigned char*)&a);` is valid? – KamilCuk Sep 04 '18 at 11:05
  • 2
    @KamilCuk As far as I know, yes, that is perfectly valid. –  Sep 04 '18 at 11:07
  • Isn't the existence trap representation implementation defined? In that case it wouldn't be UB always. – user694733 Sep 04 '18 at 11:09
  • 1
    @user694733 - It doesn't need to be a trap representation to have UB. The quote above doesn't at all require the access to trap. Simply yielding 42 for a `_Bool` is enough UB. – StoryTeller - Unslander Monica Sep 04 '18 at 11:10
  • @StoryTeller Quotation is missing lower half of the paragraph which says: *"Such a representation is called a trap representation."* – user694733 Sep 04 '18 at 11:11
  • @user694733 You're right that this is a trap representation regardless of what behaviour the implementation gives it. As for implementation-defined, I don't think so. There are types for which certain implementation-defined aspects allow us to deduce that there are no trap representations (such as `sizeof(int) * CHAR_BIT == 32 && INT_MIN+1 == -2147483647 && INT_MAX == 2147483647`), but no blanket requirement to document all trap representations. –  Sep 04 '18 at 11:13
  • @user694733 - Fair enough. Though I think the definition in [3.19.4](https://port70.net/~nsz/c/c11/n1570.html#3.19.4) is more satisfactory in this case. It's simply not a valid representation of a bool. Ergo, UB. – StoryTeller - Unslander Monica Sep 04 '18 at 11:14
  • I agree with the previous comments, this is merely the formal definition of a trap representation. I've never heard of booleans with trap representation. How do you explain this? http://coliru.stacked-crooked.com/a/8b5ede2a92714caf The output is `42 42`. Yet C17 6.2.6.1 §6 says "The value of a structure or union object is never a trap representation" – Lundin Sep 04 '18 at 11:29
  • Actually the linked snippet is even stranger since there's an assignment from _Bool to _Bool and the compiler had a chance to assume 0 or 1. – Lundin Sep 04 '18 at 11:31
  • 4
    @Lundin `no_trap_t no_trap = {b};` is still invalid because it reads a trap representation. What p6 means is that if that were valid (or if you get a trap representation in a structure member some other way), then performing `no_trap = no_trap;` is also valid: even though it copies a member which has a trap representation, the structure as a whole does not. –  Sep 04 '18 at 11:32
  • 6.2.6.2 p1 and it's footnote 53 (non-normative I know, but displays intent of standard) seem to suggest that unsigned integer types (incl. `_Bool` according 6.2.5 p6) can have combinations of padding bytes that are not trap reps. In that case it would be possible have conforming program which would give output `0 0` (assuming LSB is value bit and rest are padding). – user694733 Sep 04 '18 at 11:39
  • @user694733 Yes, an implementation is allowed to decide that. There is a requirement that all bits zero is a valid representation of zero for all integer types (6.2.6.2p5), but it need not be the only valid representation of zero. –  Sep 04 '18 at 11:45
  • 2
    @Lundin: The purpose of guaranteeing that a struct or union is never a trap representation is essentially to say that a compiler may only replace struct assignment with member-by-member assignment if there are no possible bit patternss the structure might hold that would cause such operations to have side-effects. If a struct member holds a bit pattern which is a trap representation or shares its meaning with any other, I think the Standard allows the corresponding member of the destination structure to hold any equivalent bit pattern (or any pattern at all if source is a trap rep). – supercat Sep 04 '18 at 16:48