34

Suppose I have some legacy code which cannot be changed unless a bug is discovered, and it contains this code:

bool data[32];
memset(data, 0, sizeof(data));

Is this a safe way to set all bool in the array to a false value?

More generally, is it safe to memset a bool to 0 in order to make its value false?

Is it guaranteed to work on all compilers? Or do I to request a fix?

Neil Kirk
  • 21,327
  • 9
  • 53
  • 91
  • according to [this thread](http://stackoverflow.com/questions/19351483/how-is-a-bool-represented-in-memory) the standard doesn't clearly specify whether all-bits-zero is a valid representation for bool (or what it represents) – M.M Oct 28 '15 at 00:24
  • I guess there are two answers here: Is it safe? Yes. Is it guaranteed? No. – user657267 Oct 28 '15 at 00:24
  • 1
    @user657267 I define "safe" as meaning "doing it results in `false`" – Neil Kirk Oct 28 '15 at 00:25
  • @NeilKirk Sure, and as others have pointed out it's almost certainly safe, there's no strict *guarantee* that it will work on every single compiler out there. – user657267 Oct 28 '15 at 00:29
  • 5
    It's not guaranteed to work on all compilers, but it will. There is too much legacy code relying on it for anyone to dare break it. – molbdnilo Oct 28 '15 at 00:32
  • @user657267 It's really only "safe" if it's guaranteed by either the standard or the implementation this code depends on. – PC Luddite Oct 28 '15 at 00:33
  • Can an observable bug be demonstrated on any platform using any compiler? I'd take a risk and say it can't. And what are the odds that if you moved your code to a vastly different platform and compiler that **this** would be among the many bugs you'll find? I'd say that probability approaches zero. I would let it slide. – Carey Gregory Oct 28 '15 at 00:46
  • 1
    @CareyGregory: While I agree in this case, such reasoning is how [unexpected bugs cause expensive spaceships to explode](http://www.around.com/ariane.html). I'd advise against it in general (hence my suggestion of an assertion, if the code cannot be changed to utilities `std::fill`). – Lightness Races in Orbit Oct 28 '15 at 00:53
  • 1
    @LightnessRacesinOrbit Having worked on code that could cause spaceships to blow up, I can tell you that "fixing" theoretical flaws like this would most likely be judged as riskier than leaving it alone. All changes, even the most innocuous, bring risk. – Carey Gregory Oct 28 '15 at 00:55
  • @CareyGregory: Right, which is why spaceships blow up. (You're not the only one!) – Lightness Races in Orbit Oct 28 '15 at 00:57
  • 9
    @LightnessRacesinOrbit Indeed, it is a reason. Another reason is people fixing things that aren't broken. I once saw a major product release fail catastrophically because someone removed an errant punctuation mark from an error message. (Utterly harmless, right? Yes, but it moved memory one byte, just enough to expose a formerly harmless data overrun.) If you can find a way to avoid both problems, let me know and we'll write a book and get rich. – Carey Gregory Oct 28 '15 at 01:05
  • 3
    @CareyGregory: But it _is_ broken. Ignoring code that works by pure chance is not "leaving working code in place". It is "leaving broken code in place". I agree that it's acceptable to leave this code in due to practical realities, but if it were to be deployed on an expensive spaceship, the _minimum_ I would accept for even the first stage of code review would be a compile-time assertion. – Lightness Races in Orbit Oct 28 '15 at 01:06
  • Does the standard prevent a space-optimizing compiler from implementing an array of bools as a packed bitword? – AShelly Oct 28 '15 at 01:07
  • 1
    @AShelly: No! 5.3.3/1 explicitly (albeit non-normatively) points out that `sizeof(bool)` is implementation-defined. Which sort of gets us only halfway to the "no", I realise, but... – Lightness Races in Orbit Oct 28 '15 at 01:12
  • 2
    @LightnessRacesinOrbit if this were new(ish) code, I would agree with you. But it's not. It's legacy code that can only be fixed with the justification of a "bug" and all the overhead that probably entails. Even just a compile-time assertion requires a new build, new packaging and deployment, with all the opportunities for breakage that those steps entail. – Carey Gregory Oct 28 '15 at 01:12
  • @CareyGregory: For the record, I'm not actually suggesting to anyone that it be changed. Remember, my first words to you were "while I agree in this case". :) – Lightness Races in Orbit Oct 28 '15 at 01:13
  • 1
    Do not. The guy will be trying to convert your code into another language may find you at your home while you are playing your favourite game on your new console and s/he may ask you "Why?". – totten Oct 28 '15 at 07:00
  • @LightnessRacesinOrbit oh that is embarrassing, I pointed to the wrong question, this is the [one I meant to point to](http://stackoverflow.com/q/29394518/1708801). It indicates there is some fuzziness around object/value representation and it does not directly address this but it points to some gaps in the standard. – Shafik Yaghmour Oct 28 '15 at 12:32
  • @ShafikYaghmour: Indeed. (Note that I've answered it ;P) – Lightness Races in Orbit Oct 28 '15 at 13:25

4 Answers4

26

Is it guaranteed by the law? No.

C++ says nothing about the representation of bool values.

Is it guaranteed by practical reality? Yes.

I mean, if you wish to find a C++ implementation that does not represent boolean false as a sequence of zeroes, I shall wish you luck. Given that false must implicitly convert to 0, and true must implicitly convert to 1, and 0 must implicitly convert to false, and non-0 must implicitly convert to true … well, you'd be silly to implement it any other way.

Whether that means it's "safe" is for you to decide.

I don't usually say this, but if I were in your situation I would be happy to let this slide. If you're really concerned, you can add a test executable to your distributable to validate the precondition on each target platform before installing the real project.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 2
    Hmmm... how did you do that? The last time I tried to post an answer that short it wouldn't let me. (And I'm sure the shortness explains the downvotes. You really could explain a bit, eh?) – Carey Gregory Oct 28 '15 at 00:17
  • 2
    @CareyGregory: 162 characters aren't enough? I guess you must not enjoy Twitter. – Lightness Races in Orbit Oct 28 '15 at 00:18
  • 4
    Upvoted, although I do wish there was some reference to the C++ spec. – nneonneo Oct 28 '15 at 00:19
  • 5
    No, I don't enjoy twitter, but it was precisely 4 characters when I commented. – Carey Gregory Oct 28 '15 at 00:19
  • @CareyGregory: You must be imagining it! – Lightness Races in Orbit Oct 28 '15 at 00:32
  • @Neil: It's yes in practice but no in theory. – Lightness Races in Orbit Oct 28 '15 at 00:32
  • Nope, not imagining it. I'm pretty sure my delusions are under control today so maybe a burp by stackexchange. – Carey Gregory Oct 28 '15 at 00:38
  • @CareyGregory: It's also possible I'm winding you up. But this comment is not to be taken as an admission of guilt in any way. Say a friend of mine hypothetically... – Lightness Races in Orbit Oct 28 '15 at 00:39
  • Yeah, I suspected a "friend" might be involved. – Carey Gregory Oct 28 '15 at 00:42
  • @CareyGregory: Well there's a first time for everything! – Lightness Races in Orbit Oct 28 '15 at 00:43
  • 1
    @nneonneo: I cannot quote standard text that does not exist, and I will not quote the entirety of the standard to prove it! – Lightness Races in Orbit Oct 28 '15 at 00:45
  • It'd make some sense to allow any non-zero bit pattern to represent `true` – M.M Oct 28 '15 at 01:14
  • @M.M: Indeed, that is so. I am actually surprised to discover that GCC 5, or its runtime goes out of its way to disallow such a representation; I do not remember previous versions being as strict (N.B. deliberately avoiding the [well-defined] int-to-bool conversion, and ignoring the pitfalls of type-punning): `{ int x = 42; cout << *(bool*)&x; }` → _"error: load of value 42, which is not a valid value for type 'bool'"_ – Lightness Races in Orbit Oct 28 '15 at 01:15
  • @LightnessRacesinOrbit [g++ compiles](http://goo.gl/fgd0bG) the test in `int qux(bool a, bool b) { if (a == b) return 3; else return 5; }` to `cmpb %sil, %dil` which suggests it only permits a single representation. The issue came up on [this recent thread](http://stackoverflow.com/questions/33206772/forcing-usage-of-bitwise-and-instead-of-boolean-and) (where my answer is probably wrong) – M.M Oct 28 '15 at 01:21
  • @M.M: It doesn't need to "permit" anything; it _decides_ the implementation that it uses ... or, more accurately, the ABI it implements does. I'm aware of no C++ ABI that specifies anything other than the logical `bool` representation that we all expect. However, that still says nothing about C++ standard mandates. – Lightness Races in Orbit Oct 28 '15 at 01:22
  • No, we were just talking about what would make sense . (But apparently there is some down-side that the g++ designers saw). – M.M Oct 28 '15 at 01:23
  • @M.M Fair enough. Yes, it appears to _strictly_ implement the ABI in this sense, if that's what you mean. – Lightness Races in Orbit Oct 28 '15 at 01:24
  • Regarding your last sentence, how would you test the representation of `bool` at *compile time*? – Nate Eldredge Oct 28 '15 at 04:13
  • @M.M: The most probable downside is optimization inhibition. In general, optimizations strive on constraints, and therefore a more constrained range of values may enable more optimizations. Still, it would be interesting to read discussions where the implementers decided on this. – Matthieu M. Oct 28 '15 at 10:02
  • 2
    @MatthieuM.: On some platforms, testing whether a particular bit of a value is set is faster than testing whether a value is non-zero. For example, many embedded controllers have a "jump if memory bit is set" instruction, but not a "jump if memory is non-zero" instruction. – supercat Nov 03 '15 at 17:52
10

No. It is not safe (or more specifically, portable). However, it likely works by virtue of the fact that your typical implementation will:

  1. use 0 to represent a boolean (actually, the C++ specification requires it)
  2. generate an array of elements that memset() can deal with.

However, best practice would dictate using bool data[32] = {false} - additionally, this will likely free the compiler up to internally represent the structure differently - since using memset() could result in it generating a 32 byte array of values rather than, say, a single 4 byte that will fit nicely within your average CPU register.

Olipro
  • 3,489
  • 19
  • 25
  • 13
    Careful; `bool data[32] = {false}` may work (read: it will; always) but it's also slightly misleading. It is not equivalent to `bool data[32] = {false, false, false, ...}` but to `bool data[32] = {false, 0, 0, 0, 0}`. The real saving grace here is that the `0` will assuredly implicitly convert to `false` anyway, but it does mean that naming `false` is a bit of a red herring, [which may give someone a big surprise one day](http://stackoverflow.com/q/14797810/560648). As such, `bool data[32] = {}` would be my preference. – Lightness Races in Orbit Oct 28 '15 at 00:41
  • Upon further reading I've verified that the C++ specification mandates that false evaluate to zero - therefore the relevant piece of my answer is the fact that the internal representation can be more efficient. So whilst you are correct, it still is guaranteed *by the specification* to initialize all elements to false. – Olipro Oct 28 '15 at 02:33
  • What makes good code is largely subjective. I don't see any more problem with `{false}` for initializing a bool array than `for(;;)` to loop infinitely - it's all verifiably compliant against the standard. – Olipro Nov 02 '15 at 00:13
  • I explained, quite clearly, and with a link to more information, what the problem is. If you go through your career writing unclear code just because it's "verifiably compliant against the standard", I hope I don't have to maintain it! – Lightness Races in Orbit Nov 02 '15 at 00:30
  • `{false}` is completely clear providing you know what C++ expands the statement to - if you make the assumption that you could change it to `{true}` to switch to initializing everything to that value, that's your problem. In fact, perhaps more to the point is that in this case, the *actual* solution it to provide a nice `//comment` explaining what you're doing if you're worried about a less-experienced developer coming across it. – Olipro Nov 02 '15 at 00:36
  • Trying to relate inappropriate keyword usage to specifiying the first value in an initializer list is really not a good way to emphasize your point - they are two very different beasts. Namely that in the argument above, the former conveys an intent, the latter does not. In any case, this discussion is Off-topic, if you wish to continue it, let's do so in chat. – Olipro Nov 02 '15 at 10:39
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/94016/discussion-between-olipro-and-lightness-races-in-orbit). – Olipro Nov 02 '15 at 19:20
  • @LightnessRacesinOrbit: Is there any reason why an initializer should make it a definition rather than a declaration? I've seen some compilers will allow `extern char foo[] = "HELLO";` as equivalent to `extern char foo[6];`--allowing any module to use `sizeof foo` to get the length. Is there any way to accomplish such a thing otherwise? – supercat Nov 03 '15 at 17:57
  • @supercat: There needs to be one translation unit in your program where the object "lives", and the definition (by definition) specifies that. Could the initialiser be on the declaration rather than the definition, in principle? Sure. I only know that this is not the case for `extern`. – Lightness Races in Orbit Nov 03 '15 at 19:01
  • @LightnessRacesinOrbit: The compiler that accepted the above construct (C, not C++, so the rules might be different, though) required that exactly one point of declaration in the project lack the "extern", but that was handled via macro. Otherwise I know way of making the size of the object available in other translation units. – supercat Nov 03 '15 at 19:08
9

Update

P1236R1: Alternative Wording for P0907R4 Signed Integers are Two's Complement says the following:

As per EWG decision in San Diego, deviating from P0907R3, bool is specified to have some integral type as its underlying type, but the presence of padding bits for "bool" will remain unspecified, as will the mapping of true and false to values of the underlying type.

Original Answer

I believe this unspecified although it seems likely the underlying representation of false would be all zeros. Boost.Container relies on this as well (emphasis mine):

Boost.Container uses std::memset with a zero value to initialize some types as in most platforms this initialization yields to the desired value initialization with improved performance.

Following the C11 standard, Boost.Container assumes that for any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type. Since _Bool/wchar_t/char16_t/char32_t are also integer types in C, it considers all C++ integral types as initializable via std::memset.

This C11 quote they they point to as a rationale actually comes from a C99 defect: defect 263: all-zero bits representations which added the following:

For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.

So then the question here is the assumption correct, are the underlying object representation for integer compatible between C and C++? The proposal Resolving the difference between C and C++ with regards to object representation of integers sought to answer this to some extent which as far as I can tell was not resolved. I can not find conclusive evidence of this in the draft standard. We have a couple of cases where it links to the C standard explicitly with respect to types. Section 3.9.1 [basic.fundamental] says:

[...] The signed and unsigned integer types shall satisfy the constraints given in the C standard, section 5.2.4.2.1.

and 3.9 [basic.types] which says:

The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values.44

where footnote 44(which is not normative) says:

The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.

The farthest the draft standard gets to specifying the underlying representation of bool is in section 3.9.1:

Types bool, char, char16_t, char32_t, wchar_t, and the signed and unsigned integer types are collectively called integral types.50 A synonym for integral type is integer type. The representations of integral types shall define values by use of a pure binary numeration system.51 [ Example: this International Standard permits 2’s complement, 1’s complement and signed magnitude representations for integral types. —end example ]

the section also says:

Values of type bool are either true or false.

but all we know of true and false is:

The Boolean literals are the keywords false and true. Such literals are prvalues and have type bool.

and we know they are convertible to 0 an 1:

A prvalue of type bool can be converted to a prvalue of type int, with false becoming zero and true becoming one.

but this gets us no closer to the underlying representation.

As far as I can tell the only place where the standard references the actual underlying bit value besides padding bits was removed via defect report 1796: Is all-bits-zero for null characters a meaningful requirement? :

It is not clear that a portable program can examine the bits of the representation; instead, it would appear to be limited to examining the bits of the numbers corresponding to the value representation (3.9.1 [basic.fundamental] paragraph 1). It might be more appropriate to require that the null character value compare equal to 0 or '\0' rather than specifying the bit pattern of the representation.

There are more defect reports that deal with the gaps in the standard with respect to what is a bit and difference between the value and object representation.

Practically, I would expect this to work, I would not consider it safe since we can not nail this down in the standard. Do you need to change it, not clear, you clearly have a non-trivial trade-off involved. So assuming it works now the question is do we consider it likely to break with future versions of various compilers, that is unknown.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • Does standard say that POD data can be ::memset() to 0? – Slava Oct 29 '15 at 15:48
  • @Slava well [this](https://isocpp.org/wiki/faq/cpp11-language-types#generalized-pods) says we can memset a POD but the details are not really spelled out and they don't seem to be spelled out in the standard either. As far as I can tell this is covered in *[basic.types]*. – Shafik Yaghmour Oct 29 '15 at 18:37
  • Then as bool can be part of POD that implicitly puts requirement for it's binary representation IMHO – Slava Oct 29 '15 at 19:12
  • @Slava the problem is we have `Values of type bool are either true or false.` so if we assume, that in the underlying representation that `0` is false and `1` is true(*which we can't prove*) what about other values? Are they true or false? We can't say, sounds undefined and so it seems like a defect or just unspecified. – Shafik Yaghmour Oct 29 '15 at 19:18
  • Does `::memset( &intvar, 0, sizeof( int ) )` guarantee that result is the same as `intvar = int{};`? Should it be the same for `boolvar = bool{};` ? – Slava Oct 29 '15 at 19:22
  • @Slava not that I can tell, if that was case then the Boost::Container folks would have just said that instead of relying on C11 for a rationale and that rationale only specifically covers the representation of `0`. – Shafik Yaghmour Oct 29 '15 at 19:38
  • [This answer](http://stackoverflow.com/a/11139915/410767) suggests C11 didn't pick up the stipulation from the C99 defect report. ***If*** the opening sentence "I believe this unspecified although..." is intended in the general English sense of "not stipulated by the Standard", perhaps it should be changed so it can't be misread as a conclusion that the behaviour is unspecified in the Standardese sense of one-of-many-sane-behaviours distinct from undefined behaviour. Cheers – Tony Delroy Apr 02 '16 at 01:07
8

From 3.9.1/7:

Types bool , char , char16_t , char32_t , wchar_t , and the signed and unsigned integer types are collectively called integral types. A synonym for integral type is integer type . The representations of integral types shall define values by use of a pure binary numeration system.

Given this I can't see any possible implementation of bool that wouldn't represent false as all 0 bits.

Mark B
  • 95,107
  • 10
  • 109
  • 188
  • 4
    Well you could implement false as 1 and true as 0 in *memory*. So long as the compiler cleverly manages all the required conversions in the code. It is similar to how null pointers are not necessarily 0 in memory. – Neil Kirk Oct 28 '15 at 00:20
  • 2
    No, the standard has no intention to restrict the representation of `bool` like that. `bool` values are guaranteed to convert to `0` and `1` but otherwise are not guaranteed to be in any way related to `0` and `1` by representation. – AnT stands with Russia Oct 28 '15 at 00:27
  • @AnT Is there any supporting text for your claim? Merely noting that the standard doesn't appear to specify the relation, is a bit thin for my liking. It *does* say that bool uses a pure binary numeration system, as in Mark's quote, but it is unclear to me what that means for a bool. – M.M Oct 28 '15 at 00:29
  • Nothing in this quote says that `bool` cannot be represented by, say, all ones. It just says that the bits must be either 0 or 1. -1 – Lightness Races in Orbit Oct 28 '15 at 00:32
  • 1
    @M.M: Well, the text you quited simply doesn't say that `false` must be represented as integral `0`. Basically, it is you who have to provide supporting text for your claim. The text you quited allows for `false` being internally represented as `66` and `true` being internally represented as `42`, as long as both representations follow the mandatory "pure binary numeration system". – AnT stands with Russia Oct 28 '15 at 00:42
  • That's right. All this quote says is "these types must be natively representable on any computer made by humans since the latter half of the 21st century". If _that's_ not "thin" then I don't know what is. – Lightness Races in Orbit Oct 28 '15 at 00:43
  • @LightnessRacesinOrbit bits are 0 or 1 by definition of the term "bit". – M.M Oct 28 '15 at 00:46
  • @M.M: Yes, I agree. This particular sentence in the standard is remarkably pointless. – Lightness Races in Orbit Oct 28 '15 at 00:48
  • 1
    @AnT For the other integer types, "pure binary numeration system" means that int `1` has to be represented as `000...001` , `42` as `000...01010010` and so on. It means more than "any series of bits" as you are suggesting; the footnote goes into detail. – M.M Oct 28 '15 at 00:49
  • @M.M: That still says absolutely nothing about `true` vs `false`. It only defines how bitwise arithmetic shall work over the representation values. You are projecting your view that "true is more than false" on to that footnote. There are indeed several schemes that use the opposite representation: POSIX shell conventions and SNMP, to name just two. C++ assuredly does not mandate it either way. – Lightness Races in Orbit Oct 28 '15 at 00:49
  • @LightnessRacesinOrbit disagree, I think it is constraining implementations to the pattern I demonstrated (certainly for non-negative values anyway). Footnote 51 clarifies this intent. I don't have a view that "true is more than false" (I have not claimed any particular representation for `bool`) – M.M Oct 28 '15 at 00:53
  • 1
    @M.M: I cannot respond to "despite your counter-argument, I'm still right". – Lightness Races in Orbit Oct 28 '15 at 00:55
  • @M.M there is a lot of fuzzines in the standard with [respect to what bit refers to in various places](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1857). – Shafik Yaghmour Oct 28 '15 at 20:18