For code related to this question, I need to compute the following as fast as possible:
Given a 32 bit integer i, compute the position of the n-th least significant set bit. Both n and the result should be 0-indexed.
For example, given the number i = 110101101012 and n = 4, the desired number is 7 as the fourth set bit is at position 7: 11010110101.
Using the pdep
instruction from the BMI2 instruction set extension for x86 and the commonly available __builtin_ctz()
intrinsic function, this can be computed easily:
j = _pdep_u32(1 << n, i);
return (__builtin_ctz(j));
However, many computers do not have the pdep
instruction (and it's slow on AMD CPUs that do have it), rendering this approach non-portable. You can also compute such bit positions without pdep
like this:
j = i;
for (k = 0; k < n; k++)
j &= j - 1;
return (__builtin_ctz(j));
However, this is pretty slow.
I am targeting computers that provide at least __builtin_popcount()
and __builtin_ctz()
. How can I find such bit positions faster?