1

If I have 128 or 256 or 512 bit memory region, how can I find number of consecutive zero bits starting from least significant bit (left-most byte). I can do:

Try it online!

#include <bit>
int CountRZero512(uint64_t const * ptr) {
    for (int i = 0; i < 8; ++i)
        if (ptr[i] != 0)
            return i * 64 + std::countr_zero(ptr[i]);
    return 512;
}

As you can see I used std::countr_zero function of C++20. Actually I generalized this std::countr_zero() function to 512-bit case.

But is it possible to do it really fast using 128/256/512-bit SIMD instructions (i.e. SSE2, AVX-2, AVX-512).

Especially this question is actual if I have a really huge array of 512-bit numbers, i.e. they are located in adjacent entries of array.

As a side second question, I wonder how to check fast if 128/256/512-bit SIMD register is equal to zero (all zero bits). Why? Because it happens in my task that great part of 512-bit numbers are totally zero (lets say half of all numbers are zero), hence I'd like to optimize my CountRZero512() function above to have a shortcut branch that checks if 512 bits are all zeros and immediately returns 512 as answer.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
Arty
  • 14,883
  • 6
  • 36
  • 69
  • 2
    Basically a duplicate of [Efficiently find least significant set bit in a large array?](https://stackoverflow.com/q/67605508) - I don't think AVX-512 fundamentally changes what you'd do, just the details like replacing `vpcmpeqd` against a zeroed vector + `vpmovmskb` with `vptestmd` / `kmov`. (Or `kortest` in a loop to check pairs of vectors). I'm not sure `vplzcntd` would be useful. If someone wants to write an AVX-512 version, it should probably get added as an answer on the linked duplicate. (If I do, I'll add it to my answer). – Peter Cordes Nov 07 '22 at 23:41

0 Answers0