0

I am looking for an efficient way of calculating the kth non-zero bit in a 32 bit int.

The best I can think of is a serial algorithm, using ctz (count trailing zeros):

uint test = 0x88;
int p=0;
for (0; p < k; ++p){
  int pos = ctz(test);
  test >>= pos;
}

but I am looking for something more parallel. This is for an opencl kernel.

Edit: For the above example, the first non-zero bit is at position 2 (zero based) and second non-zero bit is at position 5. There are no other non-zero bits.

Thanks!

Jacko
  • 12,665
  • 18
  • 75
  • 126

1 Answers1

2

The question is somewhat unclear but I presume that your desired output is the bit position of the Nth set bit in the input integer.

Alas I am not certain there is much room to speed this up. However we can try bringing the complexity down from a linear to a logarithmic operation, which may or may not help depending on the typical bit number sought after. The integer width is small and constant here so the scope for improvement is limited.

The idea is to recursively perform a population counts on the lower half of the input integer. If the target bit exceeds the number in the lower half, then select the upper and recurse on the upper, otherwise select the lower.

A neat benefit is that partial totals of the traditional recursive population count trick can be reused in the search.

Something along these, virtually untested, lines:

unsigned int position_of_nth_set_bit(uint_fast32_t input, unsigned int bitno) {
    // Perform the common population-count trick of recursively building up the
    // sub counts as hierarchal bit-fields
    const uint_fast32_t mask1 = UINT32_C(0x55555555);
    const uint_fast32_t mask2 = UINT32_C(0x33333333);
    const uint_fast32_t mask4 = UINT32_C(0x0F0F0F0F);
    const uint_fast32_t mask8 = UINT32_C(0x00FF00FF);

    const uint_fast32_t pop1  = input;
    const uint_fast32_t pop2  = (pop1 >> 1 & mask1) + pop1 & mask1;
    const uint_fast32_t pop4  = (pop2 >> 2 & mask2) + pop2 & mask2;
    const uint_fast32_t pop8  = (pop4 >> 4 & mask4) + pop4 & mask4;
    const uint_fast32_t pop16 = (pop8 >> 8 & mask8) + pop8 & mask8;

    unsigned int bitpos = 0;

    // Recursively check whether our target bit fits into the upper or lower
    // half-space, and shift down according
    unsigned int halfspace;
    if(halfspace = (pop16 & 15), bitno > halfspace) {
        bitno -= halfspace;
        bitpos += 16;
    }
    if(halfspace = (pop8 >> bitpos) & 7, bitno > halfspace) {
        bitno -= halfspace;
        bitpos += 8;
    }
    if(halfspace = (pop4 >> bitpos) & 7, bitno > halfspace) {
        bitno -= halfspace;
        bitpos += 4;
    }
    if(halfspace = (pop2 >> bitpos) & 3, bitno > halfspace) {
        bitno -= halfspace;
        bitpos += 2;
    }
    if(halfspace = (pop1 >> bitpos) & 1, bitno > halfspace)
        bitpos += 1;

    return bitpos;
}
doynax
  • 4,285
  • 3
  • 23
  • 19