0

How to convert bitarray to set quickly with c++? Each of the actual bitarrays has 750,000 bits.

Example 1:

bitarray: 01011111
set: {0,1,2,3,4,5,7}
or set: {1,3,4,5,6,7}

Example 2:

bitarray: 0101 1111 0001 0001
set: {0,4,8,9,10,11,12,14}
or set: {1,3,4,5,6,7,11,15}

The set is an array of usigned 32 bit integers (uint32_t). Both kinds of set are acceptable.

The bitarray is contiguous in memory. The first bit of the bitarray has correct alignment for simd. For now I am using a custom memory allocator with std::vector to hold the bitarray. 1 bit in memory per 1 bit in the bitarray.

Thanks.

Update:

this so question does the reverse

loop through bits in c

How to define and work with an array of bits in C?

gmpy uses the scan1 function of the gmp library. scan1 seems find first set, as in wikipedia here

Community
  • 1
  • 1
rxu
  • 1,369
  • 1
  • 11
  • 29
  • What is your container for the bit array? – Alden Jul 12 '16 at 17:23
  • so far it is a std::vector – rxu Jul 12 '16 at 17:24
  • std::vector? Or are you storing the bits in a numeric type? – Alden Jul 12 '16 at 17:25
  • Where are you stuck,and what have you tried? 750,000 bits is only about 91.1 KB. What's the expected output format? Are you asking for bracketed, comma separated ascii values, like those shown, or perhaps something else? – enhzflep Jul 12 '16 at 17:27
  • numeric type now. 8 bits packed as an unsigned 8 bit integer. – rxu Jul 12 '16 at 17:28
  • problem is there are lots and lots of 750,000 bits. They are made on demand. – rxu Jul 12 '16 at 17:28
  • yea. but sets get made, sets get destroyed. the sets have lots of element at the beginning. then less later. When there are less I switched to another format. That later part of program is done (for now...). – rxu Jul 12 '16 at 17:30
  • I am really not trolling. The thing is I just started writing c++ like 2-3 weeks ago. right now i am reading the gmpy source code to figure that out. I was writing python before that. – rxu Jul 12 '16 at 17:33

2 Answers2

1

If I understand your question:

for (int i = 0; i < 750000; ++i) {
    if (bitarray_has(bitarray, i)) {
        set_of_numbers.push_back(i);
    }
}

I don't believe walking bitarray will be particularly slow, but the push_back() can be made faster if you know how many elements will be created. Then you can use reserve() to pre-allocate the memory.

jxh
  • 69,070
  • 8
  • 110
  • 193
0

code:

#include <iostream>
#include <vector>
#include <time.h>

using namespace std;

template <typename T>
uint32_t bitarray2set(T& v, uint32_t * ptr_set){
    uint32_t i;
    uint32_t base = 0;
    uint32_t * ptr_set_new = ptr_set;
    uint32_t size = v.capacity();
    for(i = 0; i < size; i++){
        find_set_bit(v[i], ptr_set_new, base);
        base += 8*sizeof(uint32_t);
    }
    return (ptr_set_new - ptr_set);
}

inline void find_set_bit(uint32_t n, uint32_t*& ptr_set, uint32_t base){
    // Find the set bits in a uint32_t
    int k = base;
    while(n){
        if (n & 1){
            *(ptr_set) = k;
            ptr_set++;
        }
        n = n >> 1;
        k++;
    }
}

template <typename T>
void rand_vector(T& v){
    srand(time(NULL));
    int i;
    int size = v.capacity();
    for (i=0;i<size;i++){
        v[i] = rand();
    }
}

template <typename T>
void print_vector(T& v, int size_in = 0){
    int i;

    int size;
    if (size_in == 0){
        size = v.capacity();
    } else {
        size = size_in;
    }
    for (i=0;i<size;i++){
        cout << v[i] << ' ';
    }
    cout << endl;
}

int main(void){
    const int test_size = 6000;
    vector<uint32_t> vec(test_size);
    vector<uint32_t> set(test_size*sizeof(uint32_t)*8);
    rand_vector(vec);
    //for (int i; i < 64; i++) vec[i] = -1;
    //cout << "input" << endl;
    print_vector(vec);
    //cout << "calculate result" << endl;

    int i;
    int rep = 10000;
    uint32_t res_size;

    struct timespec tp_start, tp_end;
    clock_gettime(CLOCK_MONOTONIC, &tp_start);
    for (i=0;i<rep;i++){
        res_size = bitarray2set(vec, set.data());
    }
    clock_gettime(CLOCK_MONOTONIC, &tp_end);
    double timing;
    const double nano = 0.000000001;

    timing = ((double)(tp_end.tv_sec  - tp_start.tv_sec )
           + (tp_end.tv_nsec - tp_start.tv_nsec) * nano) /(rep);

    cout << "timing per cycle: " << timing << endl;
    cout << "print result" << endl;
    //print_vector(set, res_size);
}

result (compiled with icc -O3 code.cpp -lrt)

...
timing per cycle: 0.000739613
print result

0.0008 seconds to convert 768000 bits to set. But there are at least 10,000 arrays of 768,000 bits in each cycle. That is 8 seconds per cycle. That is slow.

rxu
  • 1,369
  • 1
  • 11
  • 29