Imagine that we are given a sorted sequence of integers, which is stored as a bitset in a single 32-bit or 64-bit integer. For example:
sequence = 0, 2, 3, 5, 6, 8, 11, 14;
mask = 1011011010010010;
16-bit bitset = 0100100101101101b = 0x496D;
Each integer in the sequence has its index, which equals number of integers before it, and position, which is simply the value of the integer. The same terms can be applied to any set bit in the corresponding bitset. For example, in the above example there are set bits with position=2 and index=1, and with position=8 and index=5.
Using popcnt
instruction, we can easily find index of set bit from position on modern x86 architecture like this:
uint32_t position_to_index(uint32_t bset, uint32_t pos) {
(a) return _mm_popcnt_u32(_bzhi_u32(bset, pos)); //BMI2 & POPCNT
(b) return _mm_popcnt_u32(bset & ((1 << pos) - 1))); //POPCNT only
}
Clearly, it also works on any other architecture with hardware popcount support.
The question is: how to efficiently compute the inverse function, which returns position of k-th set bit in a given integer by given index k?
To make things simple, it can be assumed that a set bit with specified index surely exists in a given bitset, i.e. no error checking is necessary. Fast performance on random input bitsets is desired, so supposedly branchless solutions would be good. Using hardware-specific instructions is welcome. Fast solutions for small (but randomly distrubuted) indices would also be interesting.
Here is a simple base solution, which should work well for small indices:
uint32_t index_to_position(uint32_t bset, uint32_t idx) {
for (uint32_t i = 0; i < idx; i++)
(a) bset = _blsr_u32(bset); //BMI1
(b) bset = bset & (bset - 1); //pure C
return _bit_scan_forward(bset); //bsf (>=386)
}