-2

I want to implement a very long boolean array (as a binary genome) and access some intervals to check if that interval is all true or not, and in addition I want to change some intervals value,

For example, I can create 4 representations:

boolean binaryGenome1[10e6]={false};
vector<bool> binaryGenome2; binaryGenome2.resize(10e6);
vector<char> binaryGenome3; binaryGenome3.resize(10e6);
bitset<10e6> binaryGenome4;

and access this way:

inline bool checkBinGenome(long long start , long long end){
  for(long long i = start; i < end+1 ; i++)
    if(binaryGenome[i] == false)
        return false;
  return true;
}
inline void changeBinGenome(long long start , long long end){
  for(long long i = start; i < end+1 ; i++)
    binaryGenome[i] = true;
}

vector<char> and normal boolean array (ass stores every boolean in a byte) both seem to be a poor choice as I need to be efficient in space. But what are the differences between vector<bool> and bitset?

Somewhere else I read that vector has some overhead as you can choose it's size and compile time - "overhead" for what - accessing? And how much is that overhead?

As I want to access array elements many times using CheckBinGenome() and changeBinGenome(), what is the fastest implementation?

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
ameerosein
  • 523
  • 3
  • 16
  • 3
    You haven't done enough homework. `std::vector` is special - it is a space-efficient representation. Also look up `std::bitset`, which is also a space-efficient representation of an array of `bool`, except its size is fixed at compile time. – Peter Sep 24 '16 at 06:47
  • See http://stackoverflow.com/questions/3806469/bit-array-in-c – John Burger Sep 24 '16 at 06:47
  • @Peter `std::vector` is considered a bad specialization, `std::bitset` is way better probably. – πάντα ῥεῖ Sep 24 '16 at 06:54
  • what about parallelizing the process? – Frederick Zhang Sep 24 '16 at 07:17
  • 1
    @ πάντα ῥεῖ - `std::vector` is considered a bad specialisation, because it behaves differently from other standard containers in quite a few ways. However, if you work within its limits (in particular, don't expect it to play like other standard containers) it is still useful. – Peter Sep 24 '16 at 07:22
  • @FrederickZhang thanks for your suggestion, now my code uses multi threads but this is a reference DS which i need to check in all threads – ameerosein Sep 24 '16 at 07:37
  • @Peter sorry, you're right. I've added alternatives so can you explain the differences and the best answer? thanks – ameerosein Oct 07 '16 at 06:51

3 Answers3

2

Use std::bitset It's the best.

vivek
  • 4,951
  • 4
  • 25
  • 33
  • are accessing functions ran in constant time? i mean operator[] and .test – ameerosein Oct 06 '16 at 15:46
  • and what are the advantages of std::bitset in comparison with vecor ? thanks. – ameerosein Oct 07 '16 at 06:52
  • Could you please explain what characteristics make it the "best"? Is there some objective measure you've used, or is this just an opinion? AFAICS, the only meaningful difference between `bitset` and `vector` (yes, the bastard non-container specialization) is that the `vector` is resizable. Is there wording that ensures `bitset` must be more space-efficient than `bool[]`? – Toby Speight Oct 07 '16 at 12:52
1

If the length of the data is known at compile time, consider std::array<bool> or std::bitset. The latter is likely to be more space-efficient (you'll have to measure whether the associated extra work in access times outweighs the speed gain from reducing cache pressure - that will depend on your workload).

If your array's length is not fixed, then you'll need a std::vector<bool> or std::vector<char>; there's also boost::dynamic_bitset but I've never used that.

If you will be changing large regions at once, as your sample implies, it may well be worth constructing your own representation and manipulating the underlying storage directly, rather than one bit at a time through the iterators. For example, if you use an array of char as the underlying representation, then setting a large range to 0 or 1 is mostly a memset() or std::fill() call, with computation only for the values at the start and end of the range. I'd start with a simple implementation and a good set of unit tests before trying anything like that.

It is (at least theoretically) possible that your Standard Library has specialized versions of algorithms for the iterators of std::vector<bool>, std::array<bool> and/or std::bitset that do exactly the above, or you may be able to write and contribute such specializations. That's a better path if possible - the world may thank you, and you'll have shared some of the maintenance responsibility.

Important note

If using std::array<bool>, you do need to be aware that, unlike other std::array<> instantiations, it does not implement the standard container semantics. That's not to say it shouldn't be used, but make sure you understand its foibles!

Community
  • 1
  • 1
Toby Speight
  • 27,591
  • 48
  • 66
  • 103
  • missing in your answer: what are differences between vector and bitset? more than fixed or no fixed length at compile time. can you explain differences in speed? why bitset is faster? – ameerosein Oct 07 '16 at 14:51
0

E.g., checking whether all the elements are true

I am really NOT sure whether this will give us more overheads than speedup or not. Actually I think that nowadays CPU can do this quite fast, are you really experiencing a poor performance? (or is this just a skeleton of your real problem?)

#include <omp.h>
#include <iostream>
#include <cstring>

using namespace std;

#define N 10000000
bool binaryGenome[N];

int main() {
    memset(binaryGenome, true, sizeof(bool) * N);
    int shouldBreak = 0;
    bool result = true;
    cout << result << endl;
    binaryGenome[9999995] = false;

    bool go = true;
    uint give = 0;
#pragma omp parallel
    {
        uint start, stop;

#pragma omp critical
        {
            start = give;
            give += N / omp_get_num_threads();
            stop = give;

            if (omp_get_thread_num() == omp_get_num_threads() - 1)
                stop = N;
        }

        while (start < stop && go) {
            if (!binaryGenome[start]) {
                cout << start << endl;
                go = false;
                result = false;
            }
            ++start;
        }
    }

    cout << result << endl;
}
Frederick Zhang
  • 3,593
  • 4
  • 32
  • 54