If I have 128 or 256 or 512 bit memory region, how can I find number of consecutive zero bits starting from least significant bit (left-most byte). I can do:
#include <bit>
int CountRZero512(uint64_t const * ptr) {
for (int i = 0; i < 8; ++i)
if (ptr[i] != 0)
return i * 64 + std::countr_zero(ptr[i]);
return 512;
}
As you can see I used std::countr_zero function of C++20. Actually I generalized this std::countr_zero() function to 512-bit case.
But is it possible to do it really fast using 128/256/512-bit SIMD instructions (i.e. SSE2, AVX-2, AVX-512).
Especially this question is actual if I have a really huge array of 512-bit numbers, i.e. they are located in adjacent entries of array.
As a side second question, I wonder how to check fast if 128/256/512-bit SIMD register is equal to zero (all zero bits). Why? Because it happens in my task that great part of 512-bit numbers are totally zero (lets say half of all numbers are zero), hence I'd like to optimize my CountRZero512()
function above to have a shortcut branch that checks if 512 bits are all zeros and immediately returns 512 as answer.