7

I want to find the fastest way to get the index of the lowest order bit of a long long. ie:

00101001001000 -> 3

Solutions involving looping and shifting are too slow. ie:

int i;
if(bits == 0ULL) {
  i = 64;
} else {
  for(i = 0;!(bits & 1ULL);i++)
    bits >>= 1;
}

EDIT: Info on usage

The function that uses ffsll can't really reduce its usage, but here it is (simplified of course). It just iterates through the indices and does something with them. This function is perhaps the most widely used function in my whole application despite a lot of caching of its value. It's a legal move generator in my alpha-beta search engine.

while(bits){
  index = ffsll(bits);
  doSomething(index);
  index &= index-1;
}
dharga
  • 2,187
  • 3
  • 24
  • 33

11 Answers11

11

Intel has specialized instructions for finding either lowest or highest order set bit. BSF seems like what you need. as far as doing it in plain C, maybe the bit twiddling hacks page has what you need.

At the very least you could use a table of nibbles or bytes to speed things up. Something like this (demonstrated for int, but easily changeable to longlong as needed).

/*
0000 - 0
0001 - 1
0010 - 2
0011 - 1
0100 - 3
0101 - 1
0110 - 2
0111 - 1
1000 - 4
1001 - 1
1010 - 2
1011 - 1
1100 - 3
1101 - 1
1110 - 2
1111 - 1
*/

int ffs(int i) {
    int ret = 0;
    int j = 0;
    static const int _ffs_tab[] = 
        { 0, 1, 2, 1, 3, 1, 2, 1, 4, 1, 2, 1, 3, 1, 2, 1 };

    while((i != 0) && (ret == 0)) {
        ret = _ffs_tab[i & 0x0f];

        if(ret > 0) {
            break;
        }

        i >>= 4;
        j += 4;

        /* technically the sign bit could stay, so we mask it out to be sure */
        i &= INT_MAX;
    }

    if(ret != 0) {
        ret += j;
    }

    return ret;
}
Evan Teran
  • 87,561
  • 32
  • 179
  • 238
  • 1
    +1 for the bit twiddling hacks. You probably want "Count the consecutive zero bits (trailing) on the right with modulus division and lookup" or "Count the consecutive zero bits (trailing) on the right by binary search". The first is constant time. – plinth Sep 25 '09 at 15:44
  • for my benchmark ffs runs in 45% of the time this function takes, i believe ffs is a portable wrapper for BSF – dharga Sep 25 '09 at 16:04
  • plus this only works for [0,15] i need it for [0,0xFFFFFFFFFFFFFFFF] not really possible – dharga Sep 25 '09 at 16:09
  • please post the the function using ffs. perhaps we could rework the function to need to do it less? – Evan Teran Sep 25 '09 at 16:09
  • what do you mean [0,15]? BSF works for 32-bits and ffs works for `int` which can be one of a few different widths. If you need it to work for 64-bits then you can just do it multiple times with bit shifting if not found yet (similar to my example with a table which only handles a nibble at a time). – Evan Teran Sep 25 '09 at 16:13
  • ahh, this is true, but then we're shifting and looping...probably going to get pretty slow. though a neat solution – dharga Sep 25 '09 at 16:18
  • Like I said, perhaps you can avoid doing the `ffs` so often. Can you provide more information on how this is being used? – Evan Teran Sep 25 '09 at 16:19
9

The fastest I've found is ffsll(long long) in string.h.

John Carter
  • 53,924
  • 26
  • 111
  • 144
dharga
  • 2,187
  • 3
  • 24
  • 33
  • Note that `ffsll()` counts the bit positions starting from 1, so to match the question you need to do `ffsll(val) - 1`, and a result of -1 means no bits set. – John Carter Sep 26 '09 at 15:15
4

If using Visual Studio, _BitScanForward:

For gcc, try __builtin_ctz or __builtin_ffs:

As always the generated code should be consulted to ensure the correct instructions are being generated.

3

You can isolate the lowest set bit with x & (~x + 1); this gives you the lowest bit value, not the index (e.g., if x = 01101000, then the result is 00001000). The fastest way I know of to get from there to an index is probably a switch statement:

switch(x & (~x + 1))
{
  case     0ULL: index = -1; break;
  case     1ULL: index =  0; break;
  case     2ULL: index =  1; break;
  case     4ULL: index =  2; break;
  ...
  case 9223372036854775808ULL: index = 63; break;
}

Ugly, but no looping involved.

John Bode
  • 119,563
  • 19
  • 122
  • 198
  • except that now you introduce rediculous amounts of branching...not efficient either – dharga Sep 26 '09 at 12:58
  • log2(x&(~x+1))? But that might depend on whether the compiler does something smart with log2 and an unsigned long long. Maybe see http://fckinc.thegerf.net/thinktank/fast_log_two ? – Rob Sep 26 '09 at 15:43
  • @dharga: yeah, I was assuming a switch being implemented as a jump table, but a quick experiment shows that my platform generates the same code for a switch as an if-else chain, so that idea's out. Next best solution would be a lookup table, but a lookup table for 64-bit values would be a bit ... big. A combination of shifting and an 8 or 16-bit lookup table would work better, but at that point you're probably better off converting to double and using log2. – John Bode Sep 26 '09 at 16:52
2

How about implementing a kind of binary search?

Look at the low bits resulting from a bit wise and of a mask value that is all ones in the low half. If that value is zero you know the smallest bit is in the upper half of the number.

other wise cut the thing in half and go again.

Mikeb
  • 6,271
  • 3
  • 27
  • 41
  • tedious, but i believe the fastest way to do it in plain C – dharga Sep 25 '09 at 16:08
  • 1
    For 64 bit, binary search will take 6 mask-and-compare; the code originally posted on average (assuming random data) only takes two (albeit with a worst case of 64). – Steve Gilham Sep 25 '09 at 23:09
2

This might work for 32 bits. Should be easy enough to extend to 64.

// all bits left of lsb become 1, lsb & right become 0
y = x ^ (-x);

// XOR a shifted copy recovers a single 1 in the lsb's location
u = y ^ (y >> 1);

// .. and isolate the bit in log2 of number of bits
i0 = (u & 0xAAAAAAAA) ?  1 : 0;
i1 = (u & 0xCCCCCCCC) ?  2 : 0;
i2 = (u & 0xF0F0F0F0) ?  4 : 0;
i3 = (u & 0xFF00FF00) ?  8 : 0;
i4 = (u & 0xFFFF0000) ? 16 : 0;
index = i4 | i3 | i2 | i1 | i0;

Obviously, if there is some way to have the hardware do it, i.e., if special CPU instructions are available, that is the way to go.

JustJeff
  • 12,640
  • 5
  • 49
  • 63
0

How about something like this? It reduces the number of loops considerably.

int shifts = 0;
if ((bits & 0xFFFFFFFFFFFFULL) == 0) // not in bottom 48 bits
{
    shifts = 48;
}
else if ((bits & 0xFFFFFFFFFFULL == 0) // not in bottom 40 bits
{
    shifts = 40;
}
else
// etc

bits >>= shifts;  // do all the shifts at once

// this will loop at most 8 times
for(i = 0;!(bits & 1ULL);i++)
    bits >>= 1;

index = shifts + i;
Tim Allman
  • 1,501
  • 11
  • 8
0

I wrote two functions, they are returning the same result as ffsll().

int func1( uint64_t n ){
  if( n == 0 ) return 0;
  n ^= n-1;
  int i = 0;
  if( n >= 1ull<<32 ){ n>>=32; i+=32; }
  if( n >= 1ull<<16 ){ n>>=16; i+=16; }
  if( n >= 1ull<< 8 ){ n>>= 8; i+= 8; }
  if( n >= 1ull<< 4 ){ n>>= 4; i+= 4; }
  if( n >= 1ull<< 2 ){ n>>= 2; i+= 2; }
  if( n >= 1ull<< 1 ){         i+= 1; }
  return i+1;
}

int func2( uint64_t n ){
  return n? ((union ieee754_float)((float)(n^(n-1)))).ieee.exponent-126: 0;
}

I don't know which is the fastest: ffsll(), func1() or func2() ?

sambowry
  • 2,436
  • 1
  • 16
  • 13
0

Here are two implementations, first by intrinsics/assembly, second is by c/c++ (Index starts from 0)

unsigned int bsf_asm(unsigned int b)
{

    // b == 0 is undefined

#if defined( \__GNUC__ )

    return __builtin_ctz(b);

#else

    __asm bsf eax, b;

#endif

}


unsigned int bsf(unsigned int b)
{

    // b == 0 is undefined

    static const unsigned char btal[] = {0, 0, 1, 0, 2, 0, 1, 0, 3, 0, 1, 0, 2, 0, 1, 0};

    int i = 0;
    if(!(b & 0x0000ffff))
    {
        b>>=16;
        i+=16;
    }

    if(!(b & 0x000000ff))
    {
        b>>=8;
        i+=8;
    }

    if(!(b & 0x0000000f))
    {
        b>>=4;
        i+=4;
    }

    return i+btal[b&0x0f];

}
zahir
  • 1,298
  • 2
  • 15
  • 35
Orup
  • 559
  • 1
  • 5
  • 14
-1

To get the right most set bit the following expression can be used

Consider the variable as X

x & ~(x - 1) gives a binary number which contains only the set bit with the rest all zeroes

Example

x      = 0101
x-1    = 0100
~(x-1) = 1011

x & ~ (x - 1) = 0100

Now continually shift this binary number to right until the number is zero and count the number of shifts which gives the right most set bit number.

Dungeon Hunter
  • 19,827
  • 13
  • 59
  • 82
-3

you can halve the complexity of your algorithm checking first if your number is odd or even. If it is even you have the lowest order bit is the first one.

For odd cases you can implement such a binary search...

Lopoc
  • 1,308
  • 5
  • 16
  • 26
  • ok well... you're right, but it's just an operation a it doesn't involve use of loops and shift, it's just a preliminary check you can choose to act or not! – Lopoc Sep 26 '09 at 13:49