7

Bools are supposed to convert to 1 for true values and to 0 otherwise. However, that says nothing on how they may actually be stored in memory. What happens if I store an arbitrary non-zero value in a bool? Does the standard guarantee correct behavior when casting those to ints?

For example, given the following program,

#include <string.h>

int main()
{
  bool b;
  memset( &b, 123, sizeof( b ) );

  return b;
}

Does the standard guarantee the program would return 1?

dragonroot
  • 5,653
  • 3
  • 38
  • 63
  • 1
    If you run that on a big-endian CPU, you're likely to return `0`. – Mark Ransom Dec 30 '15 at 03:43
  • 2
    @mark: memset sets all the bytes to the same value (123/in this case) – rici Dec 30 '15 at 03:50
  • *"Does the standard guarantee correct behavior when ..."* - the Standard *always* guarantees correct behaviour **by definition**, because when it defines behaviour any compliant implementation must provide, and when it doesn't any behaviour is correct. Your lead in "Bools are supposed" implies you have `int{1}` in mind as being "correct", but that's bogus given the `memset` causes Undefined Behaviour. – Tony Delroy Dec 30 '15 at 05:06
  • FWIW, Footnote 48) of 3.9.2/1 [basic.compound] says "Using a `bool` value in ways described by this International Standard as “undefined,” such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither `true` nor `false`.", so reasoning about how a cast from `true` or `false` should map to `1` or `0` is already flawed as you may not have started with either after invoking undefined behaviour. – Tony Delroy Dec 30 '15 at 05:09
  • @TonyD I don't see any Standard text that makes OP's code undefined in the first place. Typically , uninitialized variables are no longer considered uninitialized if they've been written to with `memset`. – M.M Dec 30 '15 at 06:13
  • @M.M. the text I quoted used *"such as by...uninitialised..."* as an example of a cause of undefined behaviour, not as the only relevant cause. Writing `123` bytes over a `bool` is the specific cause of undefined behaviour that's relevant here. – Tony Delroy Dec 30 '15 at 07:03
  • @TonyD how does writing 123 bytes over a bool cause UB? – M.M Dec 30 '15 at 07:04
  • @M.M. the Standard has some provisions (3.9/2) for writing bytes over objects with things like `memcpy` *if* the data was originally copied from another such object that was in a valid state at the time (amongst other stipulations). You can't just blat arbitrary bytes over C++ objects - whether *bool* or not. I'll dig up the chapter and verse if that's unfamiliar to you. – Tony Delroy Dec 30 '15 at 07:07
  • @TonyD that would be great, I looked and couldn't find anything that clearly covered the behaviour after memcpy'ing in an invalid representation for an integer type (and also there is the issue of whether it is a valid representation or not) – M.M Dec 30 '15 at 07:10
  • @M.M. proven harder to pinpoint than I expected... perhaps it's considered too obvious to bother stating. Writing arbitrary values into the memory forming object's value representation is very obviously crazy. There are some statements about intent to build on C's memory model - perhaps it's more explicit. There's also a [SO answer](http://stackoverflow.com/questions/33380742/is-it-safe-to-memset-bool-to-0) that casts doubt on whether even `memset` to `0` for `false` is guaranteed to work. Still, I'm going to draw a line under the 20 minutes I've spent looking. – Tony Delroy Dec 30 '15 at 07:52
  • 1
    @TonyD yeah, it took me a whole lot longer than twenty minutes to piece together the info I used in my answer. I was just annoyed with the existing answers. It was an interesting question in which none of the answers actually tried to support their position. – Shafik Yaghmour Dec 30 '15 at 09:48

1 Answers1

5

No, reading from that bool after the memset is (at least, see below) unspecified behaviour so there is no guarantee as to what value will be returned.

It might turn out that in the particular architecture, the value representation of a bool consists only of the high-order bit, in which case the value produced by broadcasting 123 over the byte(s) of the bool would be turn out to be a representation of false.

The C++ standard does not specify what the actual bit patterns representing the values true and false are. An implementation may use any or all of the bits in the object representation of a bool -- which must be at least one byte, but might be longer -- and it may map more than one bit pattern to the same value:

§3.9.1 [basic.fundamental]/1:

…For narrow character types, all bits of the object representation participate in the value representation. For unsigned narrow character types, each possible bit pattern of the value representation represents a distinct number. These requirements do not hold for other types.

Paragraph 6 of the same section requires values of type bool to be either true or false, but a footnote points out that in the face of undefined behaviour a bool "might behave as if it is neither true nor false." (That's obviously within the bounds of undefined behaviour; if a program exhibits UB, there are no requirements whatsoever on its execution, even before the UB is evidenced.)

Nothing in the standard permits using low-level memory copying operations on objects other than arrays of narrow chars, except for the case in which the object is trivially copyable and the object representation is saved by copying it to a buffer and later restored by copying it back. Any other use of C library functions which overwrite arbitrary bytes in an object representation should be undefined by the general definition of undefined behaviour ("[the standard] omits any explicit definition of behavior"). But I'm forced to agree that there is no explicit statement that memset is UB, and so I'll settle on unspecified behaviour, which seems quite clear since the representation of bool is certainly unspecified.

rici
  • 234,347
  • 28
  • 237
  • 341
  • 1
    Well, FWIW, both gcc and clang make it return 1, which I would find strange in that case, since it would seem like an extra work to me. – dragonroot Dec 30 '15 at 03:30
  • 3
    @dragonroot: undefined behaviour is allowed to do that. :) – rici Dec 30 '15 at 03:37
  • Seems like a good answer, but I'd appreciate more about why this becomes undefined. – Mark Ransom Dec 30 '15 at 03:44
  • I never said the compilers were misbehaving, just that they seem to be doing more work than necessary in that case – dragonroot Dec 30 '15 at 03:50
  • 2
    can you give a standard cite that it is UB? The term "trap representation" does not appear in the C++ Standard – M.M Dec 30 '15 at 03:54
  • @M.M.: You're right; C++ only specifies traps by reference, and neither of the references apply to bool. So I removed that example. However, the standard explicitly does not specify which bit(s) in the object representation of a bool constitute the value representation, so the other example where the only bit in the value representation is the high-order bit seems to me to be valid. – rici Dec 30 '15 at 04:44
  • @M.M.: Ok, I can't find anything explicit. 3.9/2 seems to imply that `invalid values` exist, but the only types with invalid values I can find in the standard are floating point (by reference), pointers (explicitly) and signed and unsigned integers (by compatibility with C). `bool` isn't in any of those categories, and `std::numeric_limits::traps` is `false`, so it may well be that memsetting a bool to a random sequence of bytes is merely unspecified. – rici Dec 30 '15 at 06:05
  • 1
    @rici yeah I don't see anything definitive either – M.M Dec 30 '15 at 06:11
  • Note the related question [Is it safe to memset bool to 0?](http://stackoverflow.com/q/33380742/1708801) – Shafik Yaghmour Dec 30 '15 at 09:12
  • @shafik: yes, I ran into your useful answer after I'd written this one. I think memsetting with 0 is safer. – rici Dec 30 '15 at 13:06