10

For code related to this question, I need to compute the following as fast as possible:

Given a 32 bit integer i, compute the position of the n-th least significant set bit. Both n and the result should be 0-indexed.

For example, given the number i = 110101101012 and n = 4, the desired number is 7 as the fourth set bit is at position 7: 11010110101.

Using the pdep instruction from the BMI2 instruction set extension for x86 and the commonly available __builtin_ctz() intrinsic function, this can be computed easily:

j = _pdep_u32(1 << n, i);
return (__builtin_ctz(j));

However, many computers do not have the pdep instruction (and it's slow on AMD CPUs that do have it), rendering this approach non-portable. You can also compute such bit positions without pdep like this:

j = i;
for (k = 0; k < n; k++)
    j &= j - 1;

return (__builtin_ctz(j));

However, this is pretty slow.

I am targeting computers that provide at least __builtin_popcount() and __builtin_ctz(). How can I find such bit positions faster?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
fuz
  • 88,405
  • 25
  • 200
  • 352

0 Answers0