-1

What can be a faster algorithm for doing bitwise AND operation on large bit-arrays? I have implemented bit-array in C++ using a char array. For now, I am iterating over each byte and performing AND operation.

void ANDoperation(char* A, char* B){
  for (int i=0; i<(array_size/8 +1); i++ ){
    A[i] &= B[i];
  }
}

For K arrays, I am executing this function iteratively K-1 times.

NutCracker
  • 11,485
  • 4
  • 44
  • 68
  • 3
    What are your requirements regarding to "speed" and "efficiency"? Is this measured to be a primary (top-two) bottleneck of your program? Have you done your measurement and benchmarking with compiler optimizations enabled? – Some programmer dude Feb 19 '20 at 07:27
  • this is the fastest way. The compiler will automatically vectorize the loop for you so it won't even loop over each `char` – phuclv Feb 19 '20 at 07:28
  • 1
    If you know that your arrays are properly aligned, and they're longer than 2 or 4 bytes, you can perform the AND in blocks of `uint16_t`, `uint32_t` or `uint64_t` (at the cost of substantially more complex code). – Dave M. Feb 19 '20 at 07:29
  • Your code might access beyond arrays in special cases or alternatively only work on a fraction of the total arrays. In that case I would first discuss fixing, before discussing optimising. Please provide a [mre]. – Yunnosch Feb 19 '20 at 07:37
  • 1
    @Someprogrammerdude yeah. I have tried with O2 optimisation. This is the bottleneck. The bit array size in my case is > 10^6. – Gaurav Gupta Feb 19 '20 at 07:46
  • Thanks @DaveM. Looks like I should try uint64_t array. That should help. – Gaurav Gupta Feb 19 '20 at 07:47
  • what's array_size? When posting code samples, please provide the minimal example that will work if copied https://stackoverflow.com/help/minimal-reproducible-example – Eyal D Feb 19 '20 at 08:16
  • 3
    `O2` may not turn on vectorization. Try `O3`, and consider making your pointers _restricted_. Without `restrict`, vectorization might be avoided. It may be also good idea to align arrays properly for vectorization (e.g., at 64-bit boundary for AVX-512). This question may be relevant: [How to tell GCC that a pointer argument is always double-word-aligned?](https://stackoverflow.com/q/9608171/580083) How do you measure it's a bottleneck? 10^6 is a very small array. – Daniel Langr Feb 19 '20 at 08:21
  • 1
    Compare these versions here: https://godbolt.org/z/hedNBo. You can observe that, in the second case, there is a minimum amount of instructions in the loop (1x vector-register load, 1x and, 1x store). – Daniel Langr Feb 19 '20 at 08:31
  • @DaveM. that's not the fastest way, and definitely not "substantially more complex". The fastest way is to use SIMD (which is far more complex than `uint64_t*`), but [compilers will automatically do that for you even if you pass a char array](https://godbolt.org/z/XZELfA). Look at the `vpand` instructions there – phuclv Feb 19 '20 at 08:45
  • Adding __ restrict __ to the parameters is the key to getting the compiler to auto-vectorize. – Andrew Bainbridge Feb 19 '20 at 15:15
  • In my tests, the above code with restrict made the compiler use the YMM registers and handle 512 bits per AND instruction. Whereas the std::bitset version compiled down to using normal registers and only did 64-bits per AND instruction. – Andrew Bainbridge Feb 19 '20 at 15:21

1 Answers1

1

If you would like to go for a more C++-ish way, I would suggest you using std::bitset in a following manner:

#include <iostream>
#include <bitset>

int main()  {
    std::bitset<3> v1(0b110);
    std::bitset<3> v2(0b011);

    v1 &= v2;

    std::cout << v1.to_string() << std::endl; // 010

    return 0;
}

Demo

or as @AndrewBainbridge suggested:

void and_operation(std::bitset<3000>& v1, std::bitset<3000> const& v2) {
    v1 &= v2;
}
NutCracker
  • 11,485
  • 4
  • 44
  • 68
  • you didn't specify `-O3` so no optimization is done. But even with optimization this will result very bad performance because `std::vector` is a specialization which operates on bit. Just check the output assembly you'll see no vectorization is done – phuclv Feb 19 '20 at 09:23
  • The Demo link (to a Godbolt session) shows that the compiler evaluated the AND at compile time. Here's a better session: https://godbolt.org/z/XN5k5u – Andrew Bainbridge Feb 19 '20 at 14:53
  • The length of the bitset needs to be large, otherwise the compiler does the whole array in a single instruction. I set the length to 3000 in my example. The resulting code is a loop that processes 64 bits per iteration. – Andrew Bainbridge Feb 19 '20 at 15:01
  • 1
    @AndrewBainbridge yes, you are right. I put 3 only as an example but will put 3000. Thank you – NutCracker Feb 19 '20 at 15:02
  • 1
    Also, "in your case" might be better as "in Andrew's case", since we've already got two commenters. – Andrew Bainbridge Feb 19 '20 at 15:02