18

In the following code, I memset() a stdbool.h bool variable to value 123. (Perhaps this is undefined behaviour?) Then I pass a pointer to this variable to a victim function, which tries to protect against unexpected values using a conditional operation. However, GCC for some reason seems to remove the conditional operation altogether.

#include <stdio.h>
#include <stdbool.h>
#include <string.h>

void victim(bool* foo)
{
    int bar = *foo ? 1 : 0;
    printf("%d\n", bar);
}

int main()
{
    bool x;
    bool *foo = &x;
    memset(foo, 123, sizeof(bool));
    victim(foo);
    return 0;
}
user@host:~$ gcc -Wall -O0 test.c
user@host:~$ ./a.out 
123

What makes this particularly annoying is that the victim() function is actually inside a library, and will crash if the value is more than 1.

Reproduced on GCC versions 4.8.2-19ubuntu1 and 4.7.2-5. Not reproduced on clang.

Thomas Padron-McCarthy
  • 27,232
  • 8
  • 51
  • 75
jpa
  • 10,351
  • 1
  • 28
  • 45
  • From C99 standard: 6.5.2 2 An object declared as type _Bool is large enough to store the values 0 and 1. – Severin Pappadeux Dec 26 '14 at 20:49
  • 8
    By defining `x` as a `bool`, you've promised the compiler that you'll only store `0` or `1` in it. By storing `123` in `x`1, you've lied to the compiler. "If you lie to the compiler, it will get its revenge." -- Henry Spencer – Keith Thompson Dec 26 '14 at 20:52
  • 3
    @SeverinPappadeux: Yes, but since any non-bitfield object must be at least `CHAR_BIT` bits (and `CHAR_BIT >= 8`), it's also large enough to hold the value `123`. You're not prevented from storing `123` in a `bool` object because of its size, but it's undefined behavior. – Keith Thompson Dec 26 '14 at 20:53
  • As you have `` header included you might consider using `true` and `false` value only (just like in Pascal with `Boolean` type). It'll make your code slightly more readable and keep you away from other values. – Grzegorz Szpetkowski Dec 26 '14 at 22:33
  • @KeithThompson: I don't think the standard forbids the compiler from storing `true` as the bit pattern `01111011`. More importantly, it may pick `11111111` as the bit pattern for `true`. A conversion of `true` to `int` must give 1, but that doesn't restrict the bit pattern of `true`. (Compare: `1.0f` to `int`.) – MSalters Dec 27 '14 at 01:36
  • @MSalters: `bool` is an integer type, and it must follow the same rules as other integer types. `true` is a macro, defined in ``, that expands to `1`. If assigning the value `1` to a `bool` object sets its bit pattern to `11111111`, then the high-order 7 bits must be padding bits. (And all-bits-zero must be a representation for `0`, though not necessarily the only representation.) – Keith Thompson Dec 27 '14 at 01:39
  • @KeithThompson: Can't the implementation restrict the values of padding bits? Because my example showed 7 padding bits `0111101` before the value bit `1`? And basically the comment to which I replied assumed those bits necessarily had to be zero, which I believe to be particularly untrue. – MSalters Dec 27 '14 at 01:48
  • 2
    @MSalters: Some bit patterns can be trap representations; accessing an object with such a representation (via an lvalue of the appropriate type) causes undefined behavior. The rules for `_Bool` conspire to make things confusing in ways that I'm too lazy to explore at the moment. Bottom line: Storing values other than `0` and `1` in a `_Bool` object is a thing to be avoided. – Keith Thompson Dec 27 '14 at 02:02

3 Answers3

16

When GCC compiles this program, the assembly language output includes the sequence

movzbl (%rax), %eax
movzbl %al, %eax
movl %eax, -4(%rbp)

which does the following:

  1. Copy 32 bits from *foo (denoted by (%rax) in assembly) to the register %eax and fill in the higher-order bits of %eax with zeros (not that there are any, because %eax is a 32-bit register).
  2. Copy the low-order 8 bits of %eax (denoted by %al) to %eax and fill in the higher-order bits of %eax with zeros. As a C programmer you would understand this as %eax &= 0xff.
  3. Copy the value of %eax to 4 bytes above %rbp, which is the location of bar on the stack.

So this code is an assembly-language translation of

int bar = *foo & 0xff;

Clearly GCC has optimized the line based on the fact that a bool should never hold any value other than 0 or 1.

If you change the relevant line in the C source to this

int bar = *((int*)foo) ? 1 : 0;

then the assembly changes to

movl (%rax), %eax
testl %eax, %eax
setne %al
movzbl %al, %eax
movl %eax, -4(%rbp)

which does the following:

  1. Copy 32 bits from *foo (denoted by (%rax) in assembly) to the register %eax.
  2. Test 32 bits of %eax against itself, which means ANDing it with itself and setting some flags in the processor based on the result. (The ANDing is unnecessary here, but there's no instruction to simply check a register and set flags.)
  3. Set the low-order 8 bits of %eax (denoted by %al) to 1 if the result of the ANDing was 0, or to 0 otherwise.
  4. Copy the low-order 8 bits of %eax (denoted by %al) to %eax and fill in the higher-order bits of %eax with zeros, as in the first snippet.
  5. Copy the value of %eax to 4 bytes above %rbp, which is the location of bar on the stack; also as in the first snippet.

This is actually a faithful translation of the C code. And indeed, if you add the cast to (int*) and compile and run the program, you'll see that it does output 1.

David Z
  • 128,184
  • 27
  • 255
  • 279
  • This is by far the best answer. I guess you've got fewer votes for being "late to the party". – Alex Dec 27 '14 at 11:04
  • 3
    This tells "what does GCC actually do", which is a good complement to hvd's answer "why is GCC allowed to do this". I happened to be more interested in the latter, which is why I've accepted that one instead. – jpa Dec 27 '14 at 15:25
  • @jpa yeah, I figured this would be a good complement to the other answer. Though I would note that your question doesn't actually _ask_ anything (it just says "Look at this weird behavior"), which makes it kind of hard to tell which sort of answer you wanted. – David Z Dec 27 '14 at 19:01
15

(Perhaps this is undefined behaviour?)

Not directly, but reading from the object afterwards is.

Quoting C99:

6.2.6 Representations of types

6.2.6.1 General

5 Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. [...]

Basically, what this means is that if a particular implementation has decided that the only two valid bytes for a bool are 0 and 1, then you'd better make sure you don't use any trickery to attempt to set it to any other value.

Community
  • 1
  • 1
  • The [Gcc doc](https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation) reads _GCC supports only two's complement integer types, and all bit patterns are ordinary values._ This means, `_Bool` also hasn't got trap representations. Not sure, if this is sloppiness in the documentation, or if there is something else in the standard allowing this optimization, though. – mafso Dec 27 '14 at 06:50
  • @mafso I think that's sloppiness in the wording. Unsigned integer types (including `_Bool`) can never use two's complement, because they don't have any sign bit at all. –  Dec 27 '14 at 10:50
  • Yes, the two's complement doesn't apply, but that doesn't change the "all bit patterns are ordinary values" and `_Bool` must have at least 8 bits. To put my question differently: Would it be, strictly speaking, necessary for Gcc to document `_Bool` as having `CHAR_BIT - 1` padding bits to make the optimization in the question possible? I'm not sure if there is perhaps another part in the standard mandating this. [...] – mafso Dec 27 '14 at 11:27
  • [...] That `_Bool` must be able to represent 0 and 1 doesn't automatically mean it's UB to store something different in it. That a `char` must be capable of holding every character of the basic execution character set doesn't mean you cannot store anything else in it (127 is perfectly legal, for example, no matter if this is in the basic execution character set). – mafso Dec 27 '14 at 11:28
  • @mafso What I was trying to point out was that that entire sentence is meant to cover only signed integer types: it is the explanation of an implementation-defined aspect of the signed integer types. I agree that it is poorly worded, but if we do take it as applying only to the signed integer types, then unless there is another part of the documentation making poorly worded claims about the representation of `_Bool`, there is no requirement for GCC to document how many of `_Bool`'s possible representations are valid. –  Dec 27 '14 at 11:46
  • @mafso BTW, "doesn't automatically mean it's UB to store something different in it" is right. It's also how I started my answer. It's not the store that's invalid. It cannot be, since it is done using a character type. It's the following read that's invalid. –  Dec 27 '14 at 11:47
  • To my understanding, padding bits are implementation-defined. And `_Bool` has (at least) one value bit, no sign bit, and at least `CHAR_BIT` bits in total. If the optimization in question is allowed _by the quote you provide_, it means, representations other than 0 and 1 are trap representations, which means (again, how I understand it) at least 7 padding bits. Put yet another way: To make your quote applicable, you must show that 123 is indeed a trap representation for `_Bool` (for Gcc in this case, if it's really implementation-defined). – mafso Dec 27 '14 at 11:58
  • `s/store...in/store or read...in or from/g` for that sentence (my reasoning still applies). – mafso Dec 27 '14 at 11:59
  • 1
    @mafso There is no requirement in the standard for an implementation to document whether any type has padding bits, except indirectly through comparisons of `sizeof(T) * CHAR_BIT` to the value of `T_MAX` as defined in ``. If a type has padding bits, there is no requirement in the standard for an implementation to document whether those padding bits are significant (whether wrong values of padding bits can produce trap representations). –  Dec 27 '14 at 13:09
  • I see. I thought all padding bits needed to be documented, but it's only for the value of the sign bit and trap representations like negative zero or one less than `-T_MAX`. Thanks for your patience. – mafso Dec 27 '14 at 13:27
12

Storing a value different than 0 or 1 in a bool is undefined behavior in C.

So actually this:

int bar = *foo ? 1 : 0;

is optimized with something close to this:

int bar = *foo ? *foo : 0;
ouah
  • 142,963
  • 15
  • 272
  • 331