159

In C++, I'm wondering why the bool type is 8 bits long (on my system), where only one bit is enough to hold the boolean value ?

I used to believe it was for performance reasons, but then on a 32 bits or 64 bits machine, where registers are 32 or 64 bits wide, what's the performance advantage ?

Or is it just one of these 'historical' reasons ?

fjellfly
  • 388
  • 1
  • 13
Jérôme
  • 26,567
  • 29
  • 98
  • 120
  • 14
    A bool is not 8-bits on my system. It's 4 bytes, the same as an int. – Brian Neal Jan 14 '10 at 14:44
  • Brian, what system do you have? I thought the same thing, but when I tried, both msvc and gcc gave me `sizeof(bool)==1`. – avakar Jan 14 '10 at 14:48
  • @avakar: an embedded system (e.g. 8-bit CPU) would suffice. – jldupont Jan 14 '10 at 14:58
  • 24
    last time someone thought what you're thinking, we ended up with std::vector, the most hated stl "feature" ever =) – Viktor Sehr Jan 14 '10 at 15:13
  • 1
    jldupont, I think you misread me. I was asking for a system, where `sizeof(bool)` would be 4. I could swear that msvc had 32-bit bools, but I just tried and it doesn't. – avakar Jan 14 '10 at 16:19
  • 8
    To be fair, the problem with `vector` isn't that it tries to be clever and pack bools into bits, but that it tries to do this *and disguise itself as a STL container*. A plain bitset would have been fine as long as it doesn't also pretend to be a STL container. – jalf Jan 14 '10 at 17:08
  • 2
    @avakar - you might be confusing the C++ `bool` data type with the Windows `BOOL` type which is typedefed to `long`. So `sizeof(bool) != sizeof(BOOL)`, which I'm sure causes a lot of confusion (and probably a fair number of bugs). Particularly since there are also `boolean` and `BOOLEAN` typedefs in Windows, which are aliases for `unsigned char`. Also, note that while it's common for `bool` to be 1 byte, the C++ standard has a note that specifically indicates that `sizeof(bool)` can be larger. – Michael Burr Jan 15 '10 at 04:10
  • 1
    Michael, thanks, yes, I'm aware of what the standard says. You might be right that my confusion could be caused by `BOOL` (which by the way is typedefed to `int`, not `long`). I'll know better now :) – avakar Jan 15 '10 at 10:44
  • @jalf: .. or if we had proper 'bit pointers' – lorro Jul 28 '16 at 13:24

7 Answers7

249

Because every C++ data type must be addressable.

How would you create a pointer to a single bit? You can't. But you can create a pointer to a byte. So a boolean in C++ is typically byte-sized. (It may be larger as well. That's up to the implementation. The main thing is that it must be addressable, so no C++ datatype can be smaller than a byte)

jalf
  • 243,077
  • 51
  • 345
  • 550
  • 8
    "byte" addressing is an architectural choice (hw level): one could very well design a system with a different "unit of addressing". For common processors, addressing a "byte" anyhow ends-up fetching more than a "byte" from external memory: this is due to efficiency reasons. – jldupont Jan 14 '10 at 14:34
  • 10
    Yes, it's a hardware choice, and if the hardware allows for it, the size of a bool could change. But the OP asked why a bool is 8 bits wide, and on systems where that is the case, it is generally because the CPU is only able to address 8-bit bytes. – jalf Jan 14 '10 at 14:46
  • On many systems a bool is the same size as an int by default, as it is much more efficient to fetch the native word size from memory. And you don't have to do shifting to get to the byte of interest. But like you said, this is an implementation detail, and can often be set using compiler switches. – Brian Neal Jan 14 '10 at 14:48
  • 3
    @jldupont: There are a few systems where pointer addresses are finer grained than bytes (I've programmed on the old TI TMS34010/20 before, which uses bit-wise pointers), but they are EXCEEDINGLY rare. – Michael Kohne Jan 14 '10 at 14:49
  • Because it doesn't matter. The OP didn't ask about those EXCEEDINGLY rare machines. I downvoted you because as detailed as your answer was, it left out the one *true* reason: That C++ requires that every object is addressable. And rather than craft special rules to allow sub-byte datatypes on the EXCEEDINGLY rare systems where it is possible, they simplified it by saying that the fundamental addressing unit in C++ is the byte (where the size of a byte is implementation-defined). And a char is 1 byte, so smaller datatypes aren't possible – jalf Jan 14 '10 at 15:03
  • I don't need a trip down memory lane. I know full well that some machines allow bit addressing. I'm not saying that's impossible or implausible. Just that it *doesn't matter to the OP's question*. – jalf Jan 14 '10 at 15:04
  • if this is true, why char data type is 1 byte ( 256 characters can store in 8 bit ( and no additional space for storing memory address ) ) ??? – Michel Gokan Khan Jan 14 '10 at 15:28
  • 1
    Not sure what you mean. Every object must be addressable, that is, it must be possible to retrieve the address of an object. The object doesn't have to store its own address. A char is typically 8 bits wide, enough to store any of 256 characters, but each char also has an address defined by where in memory it is located. That is why you can create a pointer to a char. – jalf Jan 14 '10 at 15:29
  • @jalf : what about int data type ? size of int types( in my GCC compiler ) is 4 bytes and a Unsigned int can stores 2**32 number ( 0 to 4294967295 ) ... so there is no additional byte for storing memory address ... why bool data type have additional bits for storing memory address but int or char NOT ? – Michel Gokan Khan Jan 14 '10 at 15:56
  • 1
    Bool doesn't have additional bits for storing the memory address. But because a memory address only points to a byte, not to an individual bit, every data type must take up at least one byte. If a bool was stored in the third bit of a byte somewhere, we would have no way to create a pointer to it. But if it takes up the entire byte, we can create a pointer to that. – jalf Jan 14 '10 at 16:21
  • 94
    If I may contribute a dodgy analogy: there are eight floors in my building, but the Post Office doesn't acknowledge that they are different addresses. So if I want an address all to myself, then I have to rent the whole building, even though I actually fit on one floor. I'm not using the other seven floors to "store an address", I'm just forced to waste them because of the Post Office rule that addresses refer to buildings, not floors. C++ objects must have an address to themselves - no post rooms to sort the mail after delivery ;-) – Steve Jessop Jan 14 '10 at 17:27
  • 1
    Please use more precise term "octet" if you mean 8-bit data type, because "byte" is not necessarily 8-bits and it leads to confusion, when some people refer to 8-bit data type and others refer to smallest chunk of addressable memory. – mip Apr 25 '16 at 16:35
42

Memory is byte addressable. You cannot address a single bit, without shifting or masking the byte read from memory. I would imagine this is a very large reason.

25

A boolean type normally follows the smallest unit of addressable memory of the target machine (i.e. usually the 8bits byte).

Access to memory is always in "chunks" (multiple of words, this is for efficiency at the hardware level, bus transactions): a boolean bit cannot be addressed "alone" in most CPU systems. Of course, once the data is contained in a register, there are often specialized instructions to manipulate bits independently.

For this reason, it is quite common to use techniques of "bit packing" in order to increase efficiency in using "boolean" base data types. A technique such as enum (in C) with power of 2 coding is a good example. The same sort of trick is found in most languages.

Updated: Thanks to a excellent discussion, it was brought to my attention that sizeof(char)==1 by definition in C++. Hence, addressing of a "boolean" data type is pretty tied to the smallest unit of addressable memory (reinforces my point).

jldupont
  • 93,734
  • 56
  • 203
  • 318
  • For all the comments you left about this, it's impressive that you left out the most important part of the answer: A `bool` type follows the smallest unit of allocatable memory **because C++ requires that it must be possible to create pointers to it**. Without that requirement, a `bool` could conceivably have been represented as a single bit even on current byte-addressable machines. – jalf Jan 14 '10 at 14:50
  • 1
    hmmm... I could craft a CPU architecture where a bit could be addressable... I could even write a compiler etc. for it. I could have a special region of memory (or whatever) that would be "bit addressable". It is not by any stretch of the imagination impossible. – jldupont Jan 14 '10 at 14:53
  • 2
    Yes, and on that system, a bool could be made to be a single bit. But the OP didn't ask "why is a bool 8 bits wide on jlduponts hypothetical CPU". He asked about current, common, everyday CPUs, and on those, it is because they are byte-addressable. – jalf Jan 14 '10 at 14:58
  • 4
    sizeof(char)==1 per definition in C++, so what your hardware can or can not do is not relevant. You can't have sizeof(bool) < sizeof(char). BTW C++ is defined in such a way that you can have "fat" pointer to address some subunit of what the hardware can address if it isn't convenient to have char the smallest hardware addressable unit. This has been used at least in some C compilers for old word addressable architectures. – AProgrammer Jan 14 '10 at 15:01
  • @AProgrammer: `sizeof(char)==1 definition` : that's the best counter-argument to my argumentation. Thanks! – jldupont Jan 14 '10 at 15:04
  • @jldupont: He did mention 32/64-bit systems. Apart from that, I just saw nothing to indicate that he was asking "why is it a natural law that bool must be 8 bits on **every** conceivable CPU". I read it as "why is it 8 bits on the systems where it *is* 8 bits", which at least makes sense. And with the edit, I just gave you an upvote. :) I just felt most of the rest of the post was missing the point. – jalf Jan 14 '10 at 15:25
  • Hmm... I suppose there's no reason you couldn't have a pointer to a bit; the pointer type would need to be 3 bits wider than the usual pointer types, is all. Now to find a computer that can handle 35-bit (or 67-bit) words natively.... – Jeremy Friesner Jan 15 '10 at 11:03
  • @Jeremy: IBM 704 has 36bits words. Many DSP processors have "strange" word length too. I can bet you that many custom processors used in security devices use that trick too. Once, long ago, I have designed a CPU with some pretty strange (but today's standards) properties. – jldupont Jan 15 '10 at 11:11
  • `sizeof(char) == 1 byte` along with the requirement that `char` data type has to be capable to hold at least 256 characters means that `bool` also has to be at least 8-bits wide, even if architecture allows more fine addressing. I wonder why `sizeof()` operator does not return number of bits instead of number of bytes, whlist byte size may vary from architecture to architecture making it not portable feature in many cases. [Add Intel 4004 to the list of weird CPUs. It was 4-bit.] – mip Apr 26 '16 at 11:41
  • 1
    @doc _"I wonder why sizeof() operator does not return number of bits instead of number of bytes, whlist byte size may vary from architecture to architecture making it not portable feature in many cases_" - It is perfectly portable to get the number of bits spanned by a type by just multiplying its `sizeof` by the standard macro `CHAR_BIT`. – underscore_d Jan 17 '17 at 01:39
7

The answers about 8-bits being the smallest amount of memory that is addressable are correct. However, some languages can use 1-bit for booleans, in a way. I seem to remember Pascal implementing sets as bit strings. That is, for the following set:

{1, 2, 5, 7}

You might have this in memory:

01100101

You can, of course, do something similar in C / C++ if you want. (If you're keeping track of a bunch of booleans, it could make sense, but it really depends on the situation.)

Benjamin Oakes
  • 12,262
  • 12
  • 65
  • 83
  • 9
    In fact, C++ does this with the specialised container vector - it is commonly seen as a disaster. –  Jan 14 '10 at 14:23
  • C++ also does this with "bit fields," inherited from C. When declaring a member variable of a struct/class, you can declare the number of bits used to store the value (e.g., "unsigned short field : 3"). –  Jan 14 '10 at 14:28
  • @Neil : why is it commonly seen as a disaster ? Is it a performance problem ? – Jérôme Jan 14 '10 at 14:30
  • 2
    @Jerome: It's because, since a bit is not addressable, it can't behave the way as a regular `vector`. It isn't actually an STL-type container, because there are constraints on the behavior. What's worse is that it causes problems with somebody having `bool`s and wanting to make a `vector` of them. It is surprising behavior, and that's not what you want in a language. – David Thornley Jan 14 '10 at 14:36
  • Well, there are possible performance problems, but basically it will no longer work like a C++ STL container (particularly with iterators) precisely because bits are not addressable. –  Jan 14 '10 at 14:37
  • 1
    @jldupont - it's sufficient to make a point like this once. And C++ makes no guarantee that bits are addressable (rather the reverse), no matter what the hardware is capable of. –  Jan 14 '10 at 14:43
2

I know this is old but I thought I'd throw in my 2 cents.

If you limit your boolean or data type to one bit then your application is at risk for memory curruption. How do you handle error stats in memory that is only one bit long?

I went to a job interview and one of the statements the program lead said to me was, "When we send the signal to launch a missle we just send a simple one bit on off bit via wireless. Sending one bit is extremelly fast and we need that signal to be as fast as possible."

Well, it was a test to see if I understood the concepts and bits, bytes, and error handling. How easy would it for a bad guy to send out a one bit msg. Or what happens if during transmittion the bit gets flipped the other way.

Cire
  • 45
  • 1
  • Ask [new question](http://stackoverflow.com/questions/ask), don't post your question as answer to other questions. – Igor Jerosimić Feb 14 '13 at 18:44
  • 6
    I think the question contained in this "answer" is actually a rhetorical one, i.e. the reason we don't implement booleans as one bit is because a single bit cannot handle error stats. – Stephen Holt Mar 11 '13 at 15:45
  • 1
    @StephenHolt but that's not the reason and TBH this answer doesn't make any sense. – mip Apr 26 '16 at 11:17
  • 1
    ...what? I don't know what you mean by "error stats", whether CRCs or suchlike, or trap representations. But in any case, even the larger types do not use their extra, 'spare' bits for "error stats" as all but extreme-environment coders rightly assume their hardware can handle error detection/correction before their code ever reads memory, so they needn't spend their time somehow padding every variable with verification info or whatever. That's not why `bool` uses 8 bits on OP's machine and 32 on mine, as those other 7 or 31 bits certainly aren't used for any "error stats". This makes no sense – underscore_d Jan 17 '17 at 01:46
1

Some embedded compilers have an int1 type that is used to bit-pack boolean flags (e.g. CCS series of C compilers for Microchip MPU's). Setting, clearing, and testing these variables uses single-instruction bit-level instructions, but the compiler will not permit any other operations (e.g. taking the address of the variable), for the reasons noted in other answers.

EBlake
  • 735
  • 7
  • 14
0

Note, however, that std::vector<bool> is allowed to use bit-packing, i.e. to store the bits in smaller units than an ordinary bool. But it is not required.

Dag B
  • 621
  • 3
  • 8