4

Is it even possible to create an array of bits with more than 100000000 elements? If it is, how would I go about doing this? I know that for a char array I can do this:

char* array;

array = (char*)malloc(100000000 * sizeof(char));

If I was to declare the array by char array[100000000] then I would get a segmentation fault, since the maximum number of elements has been exceeded, which is why I use malloc.

Is there something similar I can do for an array of bits?

Eddy
  • 6,661
  • 21
  • 58
  • 71
  • Under what circumstances would you actually need that many bits in a single location? – Dennis Zickefoose Mar 27 '10 at 18:04
  • 1
    Exact duplicate - same user - yesterday: http://stackoverflow.com/questions/2525310 – Paul R Mar 27 '10 at 19:45
  • Sorry if the question seems similar, but it is in fact NOT exactly the same question. Previously I asked how to define an array of bits in general. Now I'm asking how to define one that is very large (i.e, more elements than is usually possible in a standard way of defining the array). – Eddy Mar 28 '10 at 13:34
  • You're probably trying to create that on the stack. If you made a global that size it would probably work. If you need to make a data structure larger than will fit in size_t for your target you'll need to investigate storing parts of your data on secondary storage (in files). Depending on your usage pattern(s) there are several file structures that may work well for you. Look up "b tree". – nategoose Mar 29 '10 at 01:55

9 Answers9

12

If you are using C++, std::vector<bool> is specialized to pack elements into a bit map. Of course, if you are using C++, you need to stop using malloc.

Dennis Zickefoose
  • 10,791
  • 3
  • 29
  • 38
  • 2
    When using `vector` be aware that it does not staisfy the container concept of the STL, which may cause problems when using STL-algorithms. See http://www.sgi.com/tech/stl/bit_vector.html – Björn Pollex Mar 27 '10 at 18:05
  • I had forgotten std::vector was specialized. Good point. – cpalmer Mar 27 '10 at 18:06
  • What do you mean by specialized? – Eddy Mar 28 '10 at 14:05
  • @Eddy: I'd have to explain what a template is first. Have you used templates before? – Ben Voigt Mar 28 '10 at 17:39
  • @Eddy: Specialized (in this case) just means that for the datatype bool, vector doesn't use the generic vector implementation. This that would waste anywhere from 7 to 63 bits of data per entry on common architectures with common compilers. vector is implemented the same way I told you to do this the other day when you asked how to do it in C. – nategoose Mar 29 '10 at 02:00
8

You could try looking at boost::dynamic_bitset. Then you could do something like the following (taken from Boost's example page):

boost::dynamic_bitset<> x(100000000); // all 0's by default
x[0] = 1;
x[1] = 1;
x[4] = 1;

The bitset will use a single bit for each element so you can store 32 items in the space of 4 bytes, decreasing the amount of memory required considerably.

cpalmer
  • 1,107
  • 6
  • 10
  • Will this work? Aren't most C++ compilers going to try to allocate x on a stack, which won't have room for such a large object? The stack is typically only 1MB or so. I think you'll probably have to use new to allocate it from the heap. – Charles E. Grant Mar 27 '10 at 18:11
  • 3
    @Charles: `x` will be on the stack, but the memory to store the bits will not be. Like `vector`, `boost::dynamic_bitset` takes an allocator template parameter, and by default allocates with `new`. There is no way in standard C++ to put anything on the stack whose size isn't known at compile time. – Steve Jessop Mar 27 '10 at 18:32
  • x is on the stack, but that doesn't mean x's members are. – Dennis Zickefoose Mar 27 '10 at 18:32
  • 1
    @Dennis: x's members are also on the stack... but this doesn't mean some of the members couldn't be pointers to dynamically allocated memory. – UncleBens Mar 27 '10 at 21:29
5

In C and C++, char is the smallest type. You can't directly declare an array of bits. However, since an array of any basic type is fundamentally made of bits, you can emulate them, something like this (code untested):

unsigned *array;
array = (unsigned *) malloc(100000000 / sizeof(unsigned) + 1);

/* Retrieves the value in bit i */
#define GET_BIT(array, i) (array[i / sizeof(unsigned)] & (1 << (i % sizeof(unsigned))))

/* Sets bit i to true*/
#define SET_BIT(array, i) (array[i / sizeof(unsigned)] |= (1 << (i % sizeof(unsigned))))

/* Sets bit i to false */
#define CLEAR_BIT(array, i) (array[i / sizeof(unsigned)] &= ~(1 << (i % sizeof(unsigned))))
Daniel Stutzbach
  • 74,198
  • 17
  • 88
  • 77
4

The segmentation fault you noticed is due to running out of stack space. Of course you can't declare a local variable that is 12.5 MB in size (100 million bits), let alone 100MB in size (100 million bytes) in a thread with a stack of ~ 4 MB. Should work as a global variable, although then you may end up with a 12 or 100 MB executable file -- still not a good idea. Dynamic allocation is definitely the right thing to do for large buffers like that.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
3

If it is allowed to use STL, then I would use std::bitset.

(For 100,000,000 bits, it would use 100000000 / 32 unsigned int underneath, each storing 32 bits.)

std::vector<bool>, already mentioned, is another good solution.

Arun
  • 19,750
  • 10
  • 51
  • 60
1

There are a few approaches to creating a bitmap in C++.

If you already know the size of bitmap at compile time, you can use the STL, std::bitset template.

This is how you would do it with bitset std::bitset<100000000> array

Otherwise, if the size of the bitmap changes dynamically during runtime, you can use std::vector<bool> or boost::dynamic_bitset as recommended here http://en.cppreference.com/w/cpp/utility/bitset (See note at the bottom)

vincentleest
  • 925
  • 1
  • 8
  • 18
  • Your answer doesn't contain any code and the link has no context. Please see [http://stackoverflow.com/help/how-to-answer](http://stackoverflow.com/help/how-to-answer). – A.L Apr 21 '14 at 01:46
0

Yes but it's going to be a little bit more complicated !

The better way to store bits is to use the bits into the char itself !

So you can store 8 bits in a char !

Which will "only" require 12'500'000 octets !

Here is some documentation about binaries : http://www.somacon.com/p125.php

You should look on google :)

Nicolas Guillaume
  • 8,160
  • 6
  • 35
  • 44
0

Other solution:

unsigned char * array;
array = (unsigned char *) malloc ( 100000000 / sizeof(unsigned char) + 1);

bool MapBit ( unsigned char arraybit[], DWORD position, bool set)
{
    //work for 0 at 4294967295 bit position
    //calc bit position
    DWORD bytepos = ( position / 8 );
    //
    unsigned char bitpos =  ( position % 8);

    unsigned char bit = 0x01;

    //get bit
    if ( bitpos )
    {
        bit = bit << bitpos;
    }

    if ( set )
    {
        arraybit [ bytepos ] |= bit;
    }
    else
    {
        //get
        if ( arraybit [ bytepos ] & bit )
            return true;
    }

    return false;
}
lsalamon
  • 7,998
  • 6
  • 50
  • 63
0

I'm fond of the bitarray that's in the open source fxt library at http://www.jjj.de/fxt/. It's simple, efficient and contained in a few headers, so it's easy to add to your project. Plus there's many complementary functions to use with the bitarray (see http://www.jjj.de/bitwizardry/bitwizardrypage.html).

dimatura
  • 1,230
  • 1
  • 11
  • 14