209

In C++,

  • Why is a boolean 1 byte and not 1 bit of size?
  • Why aren't there types like a 4-bit or 2-bit integers?

I'm missing out the above things when writing an emulator for a CPU

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Asm
  • 2,101
  • 2
  • 13
  • 4
  • 13
    In C++ you can "pack" the data by using bit-fields. `struct Packed { unsigned int flag1 : 1; unsigned int flag2: 1; };`. Most compilers will allocate a full `unsigned int`, however they deal with the bit-twiddling by themselves when you read / write. Also they deal by themselves with the modulo operations. That is a `unsigned small : 4` attribute has a value between 0 and 15, and when it should get to 16, it won't overwrite the preceding bit :) – Matthieu M. Jan 07 '11 at 15:15
  • 1
    But note / beware that it's *not* thread-safe for different threads to write adjacent bitfields in the same object. It *is* thread-safe for them to write separate `bool` members of a struct/class. This means compilers are allowed to implement bitfield writes by loading the containing word, doing some bit-manipulation, then just storing the whole word (not doing an atomic CAS). Related: [C++ memory model and race conditions on char arrays](https://stackoverflow.com/q/19903338) - that's why word-addressable machines can't use 1-byte `char` in a C11 or C++11 implementaiton. – Peter Cordes Dec 12 '21 at 03:01

13 Answers13

309

Because the CPU can't address anything smaller than a byte.

Paul Tomblin
  • 179,021
  • 58
  • 319
  • 408
  • 52
    Actually, the four x86 instructions `bt`, `bts`, `btr` and `btc` *can* address single bits! – fredoverflow Jan 07 '11 at 16:07
  • 16
    I think `bt` addresses a byte offset and then tests the bit at a given offset, regardless, when specifying an address you go in bytes...bit offset literals would get a *bit wordy* (excuse the pun). – user7116 Jan 07 '11 at 16:12
  • 3
    @six: You can load the beginning of an array in one register and then the relative "bit offset" into a second. The bit offset is not limited to "within one byte", it can be any 32 bit number. – fredoverflow Jan 07 '11 at 16:15
  • Gotcha, I hadn't used `bt` with a register, only with an 8 bit immediate. – user7116 Jan 07 '11 at 16:34
  • 5
    Well, yes and no. We do have bitfields, and we could have a bitfield pointer, that is address + bit number. Obviously, such a pointer would not be convertible to void* because of the extra storage requirement for the bit number. – Maxim Egorushkin Jan 07 '11 at 17:10
  • What a waste of resources though. – gEdringer Sep 05 '16 at 10:02
  • 3
    @gEdringer if you're trying to cram as much information into sub-byte fields as you can fit, there are always bitfields. – Paul Tomblin Sep 06 '16 at 12:25
  • 1
    some CPUs/MCUs like 8051 do have bit-addressable memory and can address single bits. Besides many architectures like ARM or PowerPC have instructions to access bitfields so you can access single bits efficiently – phuclv Jan 20 '21 at 16:53
  • @phuclv unless you can make a pointer to it, you can't make it a datatype. – Paul Tomblin Jan 21 '21 at 13:26
57

From Wikipedia:

Historically, a byte was the number of bits used to encode a single character of text in a computer and it is for this reason the basic addressable element in many computer architectures.

So byte is the basic addressable unit, below which computer architecture cannot address. And since there doesn't (probably) exist computers which support 4-bit byte, you don't have 4-bit bool etc.

However, if you can design such an architecture which can address 4-bit as basic addressable unit, then you will have bool of size 4-bit then, on that computer only!

Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • 7
    "you will have int of size 4-bit then, on that computer only" - no you won't, because the standard forbids CHAR_BIT from being less than 8. If the addressable unit on the architecture is less than 8 bits, then a C++ implementation will just have to present a memory model that's different from the underlying hardware's memory model. – Steve Jessop Jan 07 '11 at 16:51
  • @Steve : oops... I overlooked that. Removed `int` and `char` from my post. – Nawaz Jan 07 '11 at 16:58
  • 2
    you can't have a 4-bit `bool` either, because the `char` is the smallest addressable unit *in C++*, regardless of what the architecture can address with its own opcodes. `sizeof(bool)` must have a value of at least 1, and adjacent `bool` objects must have their own addresses *in C++*, so the implementation just has to make them bigger and waste memory. That's why bit fields exist as a special case: the bitfield members of a struct aren't required to be separately addressable, so they can be smaller than a `char` (although the whole struct still can't be). – Steve Jessop Jan 08 '11 at 12:59
  • 1
    @ Steve Jessop : that seems interesting. could you please give me the reference from the language specification where it says `char` is the smallest addressable unit in C++? – Nawaz Jan 08 '11 at 13:09
  • 4
    closest specific statement is probably 3.9/4: "The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T)". Obviously `sizeof(bool)` can't be 0.5 :-) I suppose an implementation could legally provide sub-byte pointers as an extension, but "ordinary" objects like bool, allocated in ordinary ways, have to do what the standard says. – Steve Jessop Jan 08 '11 at 15:43
  • I think 8 bits was also chosen for the instruction set. That gives you 256 possible instructions which is enough to implement a good CPU (like the 6502 and Z80). Less bits and you have to use tricks or multiple _bytes_ to implement instructions which makes the CPU more complicated. – Alexis Wilke Aug 26 '19 at 23:57
26

Back in the old days when I had to walk to school in a raging blizzard, uphill both ways, and lunch was whatever animal we could track down in the woods behind the school and kill with our bare hands, computers had much less memory available than today. The first computer I ever used had 6K of RAM. Not 6 megabytes, not 6 gigabytes, 6 kilobytes. In that environment, it made a lot of sense to pack as many booleans into an int as you could, and so we would regularly use operations to take them out and put them in.

Today, when people will mock you for having only 1 GB of RAM, and the only place you could find a hard drive with less than 200 GB is at an antique shop, it's just not worth the trouble to pack bits.

Jay
  • 26,876
  • 10
  • 61
  • 112
  • 2
    Except when dealing with Flags. Things like Setting multiple options on something... eg. 00000001 + 00000100 = 00000101. – Armstrongest Jan 07 '11 at 21:22
  • 1
    @Atomix: I almost never do this anymore. If I need two flags, I create two boolean fields. I used to write code where I'd pack flags like that and then write "if flags & 0x110 != 0 then" or the like, but this is cryptic and these days I generally make separate fields and write "if fooFlag || barFlag" instead. I wouldn't rule out the possibility of cases where packing flags like that is better for some reason, but it's no longer necessary to save memory like it used to be. – Jay Jan 10 '11 at 18:56
  • Yeah, I was thinking of flags that are already baked into a language. For example, in C# I'm sure RegexOptions uses a flag to set options: `RegexOptions.CultureInvariant + RegexOptions.IgnoreCase` – Armstrongest Jan 11 '11 at 22:01
  • @Atomix: Oh, sure. MS Windows SDK had tons of these built in, so you had no choice but to deal with them. – Jay Jan 12 '11 at 14:03
  • 7
    Actually, it is _quite_ worth your trouble to pack bits, if you want your computation to be fast - on that large amount of data you store in memory. Packing booleans isn't just for smaller storage - it means you can read your boolean input arrays 8 times faster (in terms of bandwidth) as when they're unpacked, and that's often quite significant. Also, you can use bit operations, like popc (population count) which speeds up your work on the CPU itself. – einpoklum May 18 '16 at 12:39
  • @einpoklum If you are writing large arrays of booleans to a file, yes, bandwidth savings could be significant. But it would have to be a binary file, not human readable, which is rather the opposite direction from where we're going these days. Well, I suppose if you're writing an XML file, instead of separate tags for each boolean you could write a single flag field as an integer, but that would be rather contrary to the philosophy of XML. Similarly, you could pack booleans into an int before writing to a DB. IMHO the gain of a small storage and bandwidth saving would be far outweighed ... – Jay May 18 '16 at 13:34
  • ... by the loss in ease of comprehension and maintainability from having a single "flag" field that the programmer has to pick apart rather than separate fields for each flag with a friendly name. (Do DB engines pack booleans? I'll have to check on this.) If you had a thousand booleans in every record, yes, it could be worth it. But more often we have two or three. – Jay May 18 '16 at 13:36
  • @Jay: I didn't say anything about a file. Think in-memory processing of large amounts of data. – einpoklum May 18 '16 at 13:40
  • @einpoklum In memory, I'd think you'd have to have truly huge numbers of booleans for the time to move them around to be a significant percentage of CPU time in any real application. And for that to even be relevant you'd have to be moving or copying them, not just building an array and processing it. So sure, if you have an app where you're manipulating a million booleans, making copies of the list, shuffling it around, whatever, then yes, might possibly be worth it. But for the more normal case where you have, like 20 in the whole app? No. – Jay May 18 '16 at 13:58
  • 4
    Truly huge number of booleans is what you work with every day if you do: DBMSes, machine learning, scientific simulations, and a whole host of other things. And - just _working_ on them means copying them - from memory into cache. A million bools is nothing, think billions. – einpoklum May 18 '16 at 17:33
  • @einpoklum (shrug) I guess the projects you work on are different from mine. I certainly store millions of Booleans on a DB, but I typically load only a tiny fraction of them into memory at a time. – Jay May 19 '16 at 04:11
  • Not only computers are being programmed, we have microcontrollers as well, and then we are indeed talking about kilobytes of ram – André van Schoubroeck Apr 12 '18 at 12:38
  • @AndrévanSchoubroeck Fair enough. If you're working on some highly constrained environment, AND you need to manipulate large numbers of booleans simultaneously, then packing them could be worthwhile. – Jay Apr 12 '18 at 16:11
  • 1
    CPU cache is always "highly constrained", like 32k L1d cache, 256k L2 cache. Having your working set fit in cache is a very big deal. Packing bitmaps is a very good thing for a big Sieve of Eratosthenes, but yes if even unpacked `bool[]` will fit in cache then it's typically better at small sizes. If you use SIMD to handle multiple elements at once, doing 128 bools per 16-byte SIMD vector instead of 16 is a nice speedup (for bitwise AND, or popcount). Or with AVX512, you can compare into a mask, or load a chunk of bitmap from memory and use it as a mask for SIMD vector elements. – Peter Cordes May 31 '20 at 21:31
  • @PeterCordes Maybe some of you folks work on very different kinds of apps than I do. I work mostly on "business apps". So I might say, read the item record for the product the customer ordered, it has a boolean for "is this item taxable", if true add it's cost to the taxable cost counter before calculating sales tax, if false don't. Maybe there's another boolean or two for other attributes of an item. That sort of thing. It is very, very rare for me to manipulate arrays of booleans. Rather, I might have 5 distinct boolean values in the whole program. ... – Jay Jun 01 '20 at 02:24
  • ... They're almost always manipulated individually, not in any sort of groups. So packing and unpacking them would be more work for the CPU, not less. If you have an app where you have some large array of booleans and you have to AND array A with array B bitwise, yes, very different story. I'm struggling to recall a time when I ever wrote such an app, but it's certainly not inconceivable. For IT departments I've worked in, though, that would be a rare to non-existant case. – Jay Jun 01 '20 at 02:26
  • In that case yes, separate booleans, one per byte, is a good design for modern CPUs. The last code I worked on that dealt with arrays of masks was data analysis for [flow cytometry](https://en.wikipedia.org/wiki/Flow_cytometry); you get 4 to 20 separate measurements per cell, over millions of cells, and want to look for correlations. So you might use one column as X, another as Y, and select rows inside a box or something. Then use that mask along with another mask on a different X Y pair. Then do whatever stats on the matching elements that pass both masks. Packed masks work great with SIMD – Peter Cordes Jun 01 '20 at 02:36
  • Other use-cases for packed bits include Linux system calls to set thread affinity, like [`sched_setaffinity(2)`](https://www.man7.org/linux/man-pages/man2/sched_setaffinity.2.html) which takes an opaque type that represents the set of all CPU numbers this thread can be scheduled on. The actual implementation is a bitmask = array of packed bits. – Peter Cordes Jun 01 '20 at 02:39
  • Also related: Howard Hinnant's article: [On `vector`](https://isocpp.org/blog/2012/11/on-vectorbool) talks about the fact that a bit-array is a potentially useful data structure, it's merely unfortunate that it's called `vector` in C++. – Peter Cordes Jun 01 '20 at 02:40
  • 1
    @PeterCordes Yes, absolutely, if I had a set of booleans that were logically the "same idea" so that I naturally think of them as an "array" in some sense, and if I'm then going to mask or filter them or otherwise perform bitwise operations on them, then packing them into bytes might make good sense. As I said earlier, I'm hard pressed to think of the last time I worked on an application where those conditions applied, but you give a couple of good examples, and I'm sure with a little imagination one could think of others. – Jay Jun 02 '20 at 15:14
18

The easiest answer is; it's because the CPU addresses memory in bytes and not in bits, and bitwise operations are very slow.

However it's possible to use bit-size allocation in C++. There's std::vector specialization for bit vectors, and also structs taking bit sized entries.

sukru
  • 2,229
  • 14
  • 15
  • 2
    Not sure I would agree that bitwise operations are slow. ands, nots, xors etc are very fast. It is typically the implementation of the bitwise operations that are slow. At the machine level they are quite fast. Branching... now that is slow. – Hogan Jan 07 '11 at 15:07
  • 6
    Just to make it more clear, if you create a vector of booleans and put 24 booleans into it, it will be taking 3 bytes only (3*8). If you put another boolean in, it will take another byte. Yet, if you push another boolean, it won't take any extra bytes because it uses the "free" bits in the last byte – Pedro Loureiro Jan 07 '11 at 15:08
  • yeah, I also doubt bitewise operations are slow :) – Pedro Loureiro Jan 07 '11 at 15:09
  • 1
    The bit vectors do not create bit-sized allocations. they create byte-sized allocations. It is not possible to allocate a single bit. – John Dibling Jan 07 '11 at 15:11
  • Find it unlikely that bit operations are slow. But std::vector is slow (now considered a mistake in most circles (but we can't undo it)). – Martin York Jan 07 '11 at 15:13
  • 5
    Reading a single bit in a bit vector requires three operations: shift, and, and another shift again. Writing is two. Whereas individual bytes can be accessed with a single one. – sukru Jan 07 '11 at 15:21
  • @PedroLoureiro: correct, it's not that bitwise operations are slow, it's that read-modify-write to update one bit in a byte is much worse than write-only to store a whole byte (or word on machines without efficient byte stores). It takes extra operations, and RMW would create a false dependency between adjacent bits in a word, preventing out-of-order execution of what would otherwise be independent work. – Peter Cordes May 31 '20 at 21:27
13

Because a byte is the smallest addressible unit in the language.

But you can make bool take 1 bit for example if you have a bunch of them eg. in a struct, like this:

struct A
{
  bool a:1, b:1, c:1, d:1, e:1;
};
Toon Krijthe
  • 52,876
  • 38
  • 145
  • 202
bratao
  • 1,980
  • 3
  • 21
  • 38
12

You can use bit fields to get integers of sub size.

struct X
{
    int   val:4;   // 4 bit int.
};

Though it is usually used to map structures to exact hardware expected bit patterns:

// 1 byte value (on a system where 8 bits is a byte)
struct SomThing   
{
    int   p1:4;   // 4 bit field
    int   p2:3;   // 3 bit field
    int   p3:1;   // 1 bit
};
Aykhan Hagverdili
  • 28,141
  • 6
  • 41
  • 93
Martin York
  • 257,169
  • 86
  • 333
  • 562
11

You could have 1-bit bools and 4 and 2-bit ints. But that would make for a weird instruction set for no performance gain because it's an unnatural way to look at the architecture. It actually makes sense to "waste" a better part of a byte rather than trying to reclaim that unused data.

The only app that bothers to pack several bools into a single byte, in my experience, is Sql Server.

Paul Sasik
  • 79,492
  • 20
  • 149
  • 189
6

bool can be one byte -- the smallest addressable size of CPU, or can be bigger. It's not unusual to have bool to be the size of int for performance purposes. If for specific purposes (say hardware simulation) you need a type with N bits, you can find a library for that (e.g. GBL library has BitSet<N> class). If you are concerned with size of bool (you probably have a big container,) then you can pack bits yourself, or use std::vector<bool> that will do it for you (be careful with the latter, as it doesn't satisfy container requirments).

Gene Bushuyev
  • 5,512
  • 20
  • 19
3

Because in general, CPU allocates memory with 1 byte as the basic unit, although some CPU like MIPS use a 4-byte word.

However vector deals bool in a special fashion, with vector<bool> one bit for each bool is allocated.

Ardent Coder
  • 3,777
  • 9
  • 27
  • 53
Ryan Li
  • 9,020
  • 7
  • 33
  • 62
  • 1
    I believe even the MIPS cpu will give you access to an individual byte, although there is a performance penalty. – Paul Tomblin Jan 07 '11 at 15:21
  • @Paul: Yes you are right, but generally the word-specific `lw`/`sw` are much more widely used. – Ryan Li Jan 07 '11 at 15:29
  • Don't know about MIPS, but IA-64 architecture allows only access on 64-bit boundary. – Gene Bushuyev Jan 07 '11 at 16:06
  • @PaulTomblin: you're correct, DEC Alpha is the only ISA in recent memory with byte-addressable memory but without byte actual byte load/store instructions. (See [Can modern x86 hardware not store a single byte to memory?](https://stackoverflow.com/q/46721075) for details). – Peter Cordes Dec 12 '21 at 03:21
  • @GeneBushuyev: Wrong for IA-64. https://www.csee.umbc.edu/portal/help/architecture/aig.pdf#page=41 confirms that IA-64 `ld` instructions supported an access size of 1, 2, 4, or 8 bytes. (For sizes less than 64-bit, the result is zero-extended into a 64-bit reg, like a normal RISC rather than x86 partial-registers.) Since IA-64 was designed by Intel with hopes of taking over from x86 (via emulation, or in early CPUs via hardware support for an IA-32 mode), unaligned word load/store is also optionally supported (even in IA-64 mode). – Peter Cordes Dec 12 '21 at 03:32
3

Think about how you would implement this at your emulator level...

bool a[10] = {false};

bool &rbool = a[3];
bool *pbool = a + 3;

assert(pbool == &rbool);
rbool = true;
assert(*pbool);
*pbool = false;
assert(!rbool);
franji1
  • 3,088
  • 2
  • 23
  • 43
0

Even when the minimum size possible is 1 Byte, you can have 8 bits of boolean information on 1 Byte:

http://en.wikipedia.org/wiki/Bit_array

Julia language has BitArray for example, and I read about C++ implementations.

Diego Javier Zea
  • 876
  • 10
  • 12
0

The byte is the smaller unit of digital data storage of a computer. In a computer the RAM has millions of bytes and anyone of them has an address. If it would have an address for every bit a computer could manage 8 time less RAM that what it can.

More info: Wikipedia

Francesco Pasa
  • 511
  • 6
  • 14
0

Bitwise operations are not 'slow'.

And/Or operations tend to be fast.

The problem is alignment and the simple problem of solving it.

CPUs as the answers partially-answered correctly are generally aligned to read bytes and RAM/memory is designed in the same way.

So data compression to use less memory space would have to be explicitly ordered.

As one answer suggested, you could order a specific number of bits per value in a struct. However what does the CPU/memory do afterward if it's not aligned? That would result in unaligned memory where instead of just +1 or +2, or +4, there's not +1.5 if you wanted to use half the size in bits in one value, etc. so it must anyway fill in or revert the remaining space as blank, then simply read the next aligned space, which are aligned by 1 at minimum and usually by default aligned by 4(32bit) or 8(64bit) overall. The CPU will generally then grab the byte value or the int value that contains your flags and then you check or set the needed ones. So you must still define memory as int, short, byte, or the proper sizes, but then when accessing and setting the value you can explicitly compress the data and store those flags in that value to save space; but many people are unaware of how it works, or skip the step whenever they have on/off values or flag present values, even though saving space in sent/recv memory is quite useful in mobile and other constrained enviornments. In the case of splitting an int into bytes it has little value, as you can just define the bytes individually (e.g. int 4Bytes; vs byte Byte1;byte Byte2; byte Byte3; byte Byte4;) in that case it is redundant to use int; however in virtual environments that are easier like Java, they might define most types as int (numbers, boolean, etc.) so thus in that case, you could take advantage of an int dividing it up and using bytes/bits for an ultra efficient app that has to send less integers of data (aligned by 4). As it could be said redundant to manage bits, however, it is one of many optimizations where bitwise operations are superior but not always needed; many times people take advantage of high memory constraints by just storing booleans as integers and wasting 'many magnitudes' 500%-1000% or so of memory space anyway. It still easily has its uses, if you use this among other optimizations, then on the go and other data streams that only have bytes or few kb of data flowing in, it makes the difference if overall you optimized everything to load on whether or not it will load,or load fast, at all in such cases, so reducing bytes sent could ultimately benefit you alot; even if you could get away with oversending tons of data not required to be sent in an every day internet connection or app. It is definitely something you should do when designing an app for mobile users and even something big time corporation apps fail at nowadays; using too much space and loading constraints that could be half or lower. The difference between not doing anything and piling on unknown packages/plugins that require at minumim many hundred KB or 1MB before it loads, vs one designed for speed that requires say 1KB or only fewKB, is going to make it load and act faster, as you will experience those users and people who have data constraints even if for you loading wasteful MB or thousand KB of unneeded data is fast.

JohnC
  • 1
  • 1