19

I want to store bits in an array (like structure). So I can follow either of the following two approaches

Approach number 1 (AN 1)

struct BIT
{
   int data : 1
};

int main()
{
   BIT a[100];
   return 0;
}

Approach number 2 (AN 2)

int main()
{
    std::bitset<100> BITS;
    return 0;
}

Why would someone prefer AN 2 over AN 1?

CLOWN
  • 193
  • 1
  • 1
  • 4
  • 3
    To quote cplusplus.com's page on bitset, "The class is very similar to a regular array, but optimizing for space allocation". If your ints are 4 bytes, a bitset uses 32 times less space. – AlcubierreDrive Oct 22 '10 at 15:00
  • 3
    Your structure `BIT` will be aligned anyway to (at least) one byte. – Archie Oct 22 '10 at 15:03
  • @Jon, post that as an answer. (It's a good point.) – Jon-Eric Oct 22 '10 at 15:04
  • 2
    @sbi Most implementations have >= 1-byte bools, so a bitset is still 8 times more efficient. – AlcubierreDrive Oct 22 '10 at 15:07
  • 1
    It should also be mentioned that depending on how you use it, AN1, while using more memory, has faster access time than AN2. I say depending since if the array is huge, the bitset/vector version may still be faster due to CPU caching of memory. – edA-qa mort-ora-y Oct 22 '10 at 16:07
  • @edA-qa mort-ora-y: an array of `bool` or `uint8_t` might be a good compromise for some cases, but a storage overhead of 31/32 = 96,875% is very seldom a good idea. – Fred Foo Oct 22 '10 at 18:11
  • @Jon: I know. But I think it's way better than that `struct` array. – sbi Oct 22 '10 at 21:15

5 Answers5

21

Because approach nr. 2 actually uses 100 bits of storage, plus some very minor (constant) overhead, while nr. 1 typically uses four bytes of storage per Bit structure. In general, a struct is at least one byte large per the C++ standard.

#include <bitset>
#include <iostream>

struct Bit { int data : 1; };

int main()
{
    Bit a[100];
    std::bitset<100> b;
    std::cout << sizeof(a) << "\n";
    std::cout << sizeof(b) << "\n";
}

prints

400
16

Apart from this, bitset wraps your bit array in a nice object representation with many useful operations.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • How come the output of `sizeof(b)` is 16? 100 bits means 100/8 bytes. What am I missing? – CLOWN Oct 22 '10 at 15:08
  • 1
    It has to be "rounded" to a multiple of eight, since a byte is the smallest unit of storage that is directly addressable. Try compiling with 128 bits in the set, then it's still 16. `bitset` uses some tricks to pretend it is actually addressing bits.. – Fred Foo Oct 22 '10 at 15:12
  • My previous comment was half right. It's rounded up to a multiple of 4 bytes, because a four-byte unit is the fastest to process for a 32-bit computer. But in any case, a byte is the smallest addressable unit, so it's at least 13 (100/8 = 12.5, and half a byte is not addressable). Sorry, friday afternoon. – Fred Foo Oct 22 '10 at 15:49
7

A good choice depends on how you're going to use the bits.

std::bitset<N> is of fixed size. Visual C++ 10.0 is non-conforming wrt. to constructors; in general you have to provide a workaround. This was, ironically, due to what Microsoft thought was a bug-fix -- they introduced a constructor taking int argument, as I recall.

std::vector<bool> is optimized in much the same way as std::bitset. Cost: indexing doesn't directly provide a reference (there are no references to individual bits in C++), but instead returns a proxy object -- which isn't something you notice until you try to use it as a reference. Advantage: minimal storage, and the vector can be resized as required.

Simply using e.g. unsigned is also an option, if you're going to deal with a small number of bits (in practice, 32 or less, although the formal guarantee is just 16 bits).

Finally, ALL UPPERCASE identifiers are by convention (except Microsoft) reserved for macros, in order to reduce the probability of name collisions. It's therefore a good idea to not use ALL UPPERCASE identifiers for anything else than macros. And to always use ALL UPPERCASE identifiers for macros (this also makes it easier to recognize them).

Cheers & hth.,

Cheers and hth. - Alf
  • 142,714
  • 15
  • 209
  • 331
2

bitset has more operations

Wernight
  • 36,122
  • 25
  • 118
  • 131
0

Approach number 1 will most likely be compiled as an array of 4-byte integers, and one bit of each will be used to store your data. Theoretically a smart compiler could optimize this, but I wouldn't count on it.

Is there a reason you don't want to use std::bitset?

Jonathan
  • 13,354
  • 4
  • 36
  • 32
0

To quote cplusplus.com's page on bitset, "The class is very similar to a regular array, but optimizing for space allocation". If your ints are 4 bytes, a bitset uses 32 times less space.

Even doing bool bits[100], as sbi suggested, is still worse than bitset, because most implementations have >= 1-byte bools.

If, for reasons of intellectual curiosity only, you wanted to implement your own bitset, you could do so using bit masks:

typedef struct {
  unsigned char bytes[100];
} MyBitset;

bool getBit(MyBitset *bitset, int index)
{
  int whichByte = index / 8; 
  return bitset->bytes[whichByte] && (1 << (index = % 8));
}

bool setBit(MyBitset *bitset, int index, bool newVal)
{
  int whichByte = index / 8;

  if (newVal)
  {
    bitset->bytes[whichByte] |= (1 << (index = % 8));
  }
  else
  {
    bitset->bytes[whichByte] &= ~(1 << (index = % 8));
  }
}

(Sorry for using a struct instead of a class by the way. I'm thinking in straight C because I'm in the middle of a low-level assignment for school. Obviously two huge benefits of using a class are operator overloading and the ability to have a variable-sized array.)

AlcubierreDrive
  • 3,654
  • 2
  • 29
  • 45
  • 4
    In c++ the only difference between a class and struct is that class members are private by default and struct members are public by default. – M2tM Jun 15 '12 at 21:39