2

Consider the program below.

All comparisons are true with a recent gcc but only the value 1 compares equal with the Visual Studio commandline compiler v. 19.16.27031.1 for x86.

I believe that it's generally OK to write into PODs through char pointers; but is there wording in the standard about writing funny values into bool variables? If it is allowed, is there wording about the behavior in comparisons?

#include <iostream>
using namespace std;

void f()
{
   if(sizeof(bool) != 1)
   {
      cout << "sizeof(bool) != 1\n";
      return;
   }

  bool b;

  *(char *)&b = 1;
  if(b == true) { cout << (int) *(char *)&b  << " is true\n"; }

  *(char *)&b = 2;
  if(b == true) { cout << (int) *(char *)&b  << " is true\n"; }

  *(char *)&b = 3;
  if(b == true) { cout << (int) *(char *)&b  << " is true\n"; }
}

int main()
{
    f();
}

P.S. gcc 8.3 uses a test instruction to effectively check for non-zero while gcc 9.1 explicitly compares with 1, making only that comparison true. Perhaps this godbolt link works.

Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62
  • Use boolean conversions instead of writing direct value: http://eel.is/c++draft/conv.bool#1 – AnatolyS Jun 26 '19 at 15:34
  • 3
    @AnatolyS Since the OP is writing to the bool through a cast, I’m not sure that’s applicable. – templatetypedef Jun 26 '19 at 15:37
  • 2
    It’s undefined to write *anything* that’s not a `bool` into a `bool`, including 0 and 1. – molbdnilo Jun 26 '19 at 15:40
  • @molbdnilo Every memcpy bool->bool writes 1 or 0 *as chars*, so that cannot be. Obviously I can use any other memory location as a source, it doesn't have to be a bool at all. The question here is which limits the standard imposes on the value range of the chars written. (That the code is not *kosher* is not in question.) Btw, this question arises with bools because other integer values do not have bit patterns which are out-of-range. Comparable issues may arise with floats though. – Peter - Reinstate Monica Jun 26 '19 at 15:42
  • @PeterA.Schneider sizeof(bool) is implementation defined, why do you decide that you have right to write to bool via char pointer? you have one legal way is to use boolean conversions to change bool variable. – AnatolyS Jun 26 '19 at 15:49
  • My answer on the dupe specifically addresses the standard wording portion. – Lightness Races in Orbit Jun 26 '19 at 16:31
  • @LightnessRacesinOrbit Thanks for the pointer, I did not find it (or any other). While the standard quote is close it is not quite on the spot because after writing into the bool's memory it is not uninitialized any longer, so it is not "described by this document as 'undefined'", at least not because it is uninitialized (it is initialized). The question is more concerned with the range of applicable values. – Peter - Reinstate Monica Jun 26 '19 at 16:39
  • @PeterA.Schneider It's exactly on the spot. An uninitialised `bool` is just one case of a `bool` that does not have a well-defined and congruous bit representation. Your approach is another way to get that outcome. There's really not much scope for splitting hairs here; you cannot arbitrarily pick a bit representation for a `bool` object, and doing so has undefined behaviour; end of story! – Lightness Races in Orbit Jun 26 '19 at 17:07

5 Answers5

4

No. This is not OK.

Writting arbitrary data in a bool is much UB (see What is the strict aliasing rule?) and similar to Does the C++ standard allow for an uninitialized bool to crash a program?

*(char *)&b = 2;

This type punning hack invoke UB. According to your compiler implementation for bool and the optimization it is allowed to do, you could have demons flying off your nose.

YSC
  • 38,212
  • 9
  • 96
  • 149
  • 1
    The strict aliasing rule is what comes to everybody's mind but it simply does not apply to aliasing through char pointers, for obvious legacy and efficiency reasons. A little less obvious is that (afaics) the byte source of a byte copy into a POD does not need to be an object of the same type. In fact, it does not need to be an object at all, as when one uses `memset()`. So what we are discussing here is the value range a byte may have when it is copied into a bool. Is it defined? Implementation defined? What happens when it is exceeded? – Peter - Reinstate Monica Jun 26 '19 at 16:04
  • 2
    Aliasing has nothing to do with this. You can alias with a `char*`. The problem is in using that power to write "wrong" bit values to an object of type `T` (here, `bool`). – Lightness Races in Orbit Jun 26 '19 at 16:32
2

Consider:

bool b;
b = char{2};     // 1
(char&)b = 2;    // 2
*(char*)&b = 2;  // 3

Here, lines 2 and 3 have the same meaning, but 1 has a different meaning. In line 1, since the value being assigned to the bool object is nonzero, the result is guaranteed to be true. However, in lines 2 and 3, the object representation of the bool object is being written to directly.

It is indeed legal to write to an object of any non-const type through an lvalue of type char, but:

In C++17, the standard does not specify the representation of bool objects. The bool type may have padding bits, and may even be larger than char. Thus, any attempt to write directly to a bool value in this way may yield an invalid (or "trap") object representation, which means that subsequently reading that value will yield undefined behaviour. Implementations may (but are not required by the standard to) define the representation of bool objects.

In C++20, my understanding is that thanks to P1236R1, there are no longer any trap representations, but the representation of bool is still not completely specified. The bool object may still be larger than char, so if you write to only the first byte of it, it can still contain an indeterminate value, yielding UB when accessed. If bool is 1 byte (which is likely), then the result is unspecified---it must yield some valid value of the underlying type (which will most likely be char or its signed or unsigned cousin) but the mapping of such values to true and false remains unspecified.

Brian Bi
  • 111,498
  • 10
  • 176
  • 312
  • OK, I assured that sizeof(bool) == 1. – Peter - Reinstate Monica Jun 26 '19 at 15:53
  • @PeterA.Schneider I hope my answer addressed your questions. You may still end up with a trap representation in C++17 since you may be writing a bit pattern that is not a valid `bool` value. – Brian Bi Jun 26 '19 at 15:56
  • @eerorika Not in gcc 8.3 and VS 2017 (bool is a byte). – Peter - Reinstate Monica Jun 26 '19 at 16:10
  • @PeterA.Schneider yeah, I now notice that I misread the standard. 1 byte is OK on all systems, and indeed even typical. Just not guaranteed. – eerorika Jun 26 '19 at 16:21
  • This is, I think, the best answer so far. Interestingly the standard (at least the amendment) is specifically discussing traps; I can assure you there weren't any ;-). I suppose that indeed bits 2..8 are padding bits, so the general padding bit discussion does apply. The amendment wording *Each set of values for any padding bits in the object representation **are alternative representations** of the value specified by the value representation* seems to suggest to me that I *could indeed* legally write arbitrary values into the bool, and only the "value bits" (probably just bit 1) are relevant – Peter - Reinstate Monica Jun 26 '19 at 16:50
  • @PeterA.Schneider Yes, the intent of the changes was to prohibit integer types from having trap representations. However, it doesn't say that bit 0 of `bool` is the value bit and the other bits are padding bits. Since it says `bool` has the same object representation and value representation as some other integer type, that means `bool` has as many padding bits as that other type. Also, the implementation is free to do something strange like saying that the value 0 of the underlying type is interpreted as `true`. – Brian Bi Jun 26 '19 at 16:54
  • Oh, I missed the "same value representation" part. In other words, no padding bits in a bool on my system. The wording is strange. – Peter - Reinstate Monica Jun 26 '19 at 17:06
1

It's OK to assign values other than true and false to a variable of type bool.

The RHS is converted to a bool by using the standard conversion sequence to true/false before the value is assigned.

However, what you are trying to do is not OK.

*(char *)&b = 2;  // Not OK
*(char *)&b = 3;  // Not OK

Even assigning 1 and 0 by using that mechanism is not OK.

*(char *)&b = 1;  // Not OK
*(char *)&b = 0;  // Not OK

The following statements are OK.

b = 2; // OK
b = 3; // OK

Update, in response to OP's comment.

From the standard/basic.types#basic.fundamental-6:

Values of type bool are either true or false.

The standard does not mandate that true be represented as 1 and/or false be represented as 0. An implementation can choose a representation that best suits their needs.

The standard goes on to say this about value of bool types:

Using a bool value in ways described by this International Standard as “undefined,” such as by examining the value of an uninitialized automatic object, might cause it to behave as if it is neither true nor false.

Storing the value char(1) or char(0) in its memory location indirectly does not guarantee that the values will be properly converted to true/false. Since theose value may not represent either true or false in an implementation, accessing those values would lead to undefined behavior.

R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • They are assigning to a `char&` that refers to the same address as the `bool`. The integer is converted to a `char`, which has different conversion sequence. – eerorika Jun 26 '19 at 15:34
  • 1
    `*(char *)&b = 1; *(char *)&b = 0; // Also not OK` (although they will *probably* work). – Martin Bonner supports Monica Jun 26 '19 at 15:37
  • I have heard your opinion. Now can you justify it with the standard? ;-) – Peter - Reinstate Monica Jun 26 '19 at 16:00
  • @PeterA.Schneider, which parts? The OK parts or the "Not OK" parts? – R Sahu Jun 26 '19 at 16:00
  • @I'm aware of narrowing conversions, so I'm more interested in the answer to my question, which are the "Not OK" parts. So far you have given zero rationale. I actually think that (provided that 1 and 0 are the internal representations of bools, and that bools have the size 1) `*(char *)&b = 1;` is perfectly OK. It's equivalent to `bool dest, src=true; memcpy(&dest, &src, 1)` provided memcpy is not an intrinsic. – Peter - Reinstate Monica Jun 26 '19 at 16:05
  • OK, one step further. Now show me where in the standard writing 2 in a bool memory is described as undefined. (It is *not* UB due to un-initialization because it *is* initialized.) – Peter - Reinstate Monica Jun 26 '19 at 16:43
  • 1
    @PeterA.Schneider, anything that is not explicitly defined as "well defined" is undefined behavior. See https://timsong-cpp.github.io/cppwp/n3337/defns.undefined. – R Sahu Jun 26 '19 at 16:51
1

Writing any integer values into a bool through a pointer to a type other than bool is undefined behavior, because those may not match the compiler's representation of the type. And yes, writing something other than 0 or 1 will absolutely break things: compilers often rely on the exact internal representation of boolean true.

But bool b = 3 is fine, and just sets b to true (the rule for converting from integer types to bool is, any nonzero value becomes true and zero becomes false).

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • 2
    `thanks to the strict aliasing rule` Aren't all types are allowed to be aliased by `char`? – eerorika Jun 26 '19 at 15:37
  • @eerorika Yes, but that only really makes a difference if you're copying in from a different `char*`-reinterpreted `bool`. The compiler isn't required to use any particular size or bit representation for the type. – Sneftel Jun 26 '19 at 15:39
  • 1
    But that means that I *can* legally write to the object through a char pointer. The strict aliasing rule simply does not apply, in order to permit `memcpy()`. – Peter - Reinstate Monica Jun 26 '19 at 15:41
  • @PeterA.Schneider That's a good point. I'll edit the reasoning. – Sneftel Jun 26 '19 at 15:41
  • @Sneftel I do not believe that there is such a source type requirement for the bytes copied into a POD. For example, one could simply copy from an array of 0-value bytes into a POD, effectively doing a memset(ptr, 0, size). Sure, copying between the same types is a common pattern, but it is no requirement. – Peter - Reinstate Monica Jun 26 '19 at 15:57
  • 1
    No, that's not what I mean. Other than looking at the but pattern of some other `bool`, there's no way to determine what the valid bit patterns of a `bool` are. All zeroes is *probably* the bit pattern of `false`, but that's not guaranteed. – Sneftel Jun 26 '19 at 16:38
0

In general, it's perfectly find to assign values other than 0 or 1 to a bool:

7.3.14 Boolean conversions [conv.bool] 1 A prvalue of arithmetic, unscoped enumeration, pointer, or pointer-to-member type can be converted to a prvalue of type bool. A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true.

But your casting is another question entirely.

Be careful thinking it's ok to write to types through pointers to something else. You can get very surprising results, and the optimizer is allowed to assume certain such things are not done. I don't know all the rules for it, but the optimizer doesn't always follow writes through pointers to different types (it is allowed to do all sorts of things in the presence of undefined behavior!) But beware, code like this:

bool f()
{
    bool a = true;
    bool b = true;
    *reinterpret_cast<char*>(&a) = 1;
    *reinterpret_cast<char*>(&b) = 2;
    return a == b;
}

Live: https://godbolt.org/z/hJnuSi

With optimizations: g++: -> true (but the value is actually 2) clang: -> false

main() {
    std::cout << f() << "\n";  // g++ prints 2!!!
}

Though f() returns a bool, g++ actually prints out 2 in main here. Probably not expected.

Chris Uzdavinis
  • 6,022
  • 9
  • 16