finding the first set bit in a binary number

Question

I need to find the first set bit in a binary number from right to left; I came up with this solution:

int cnt=0;
while (number& 1 ==0)
{
    cnt++;
    number>>=1;
}

Is there a better way of doing it? Some clever bit manipulation technique?

That will find you the first set bit, from right to left. Which do you want? — Chowlett, Feb 07 '14 at 09:16
I don't currently have the time to make this into an answer, but you can use [these](http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightParallel) from bit twiddling hacks to count consecutive zeros on the right (and so, first bit set) — Hasturkun, Feb 07 '14 at 09:26
"better" in what way? Readability/Maintainability, code size, binary size, performance? In case of performance, do you have additional information on the expected input (e.g. probability distribution)? — Hulk, Feb 07 '14 at 09:33
Beware the lure of cleverness; many a great mind has been lost to its call. — molbdnilo, Feb 07 '14 at 09:46

Jarod42 · Answer 1 · 2014-02-07T10:13:22.887

11

processor may have instruction to find that directly:

Windows/MSVC:

GCC:

These typically map directly to native hardware instructions. So it doesn't get much faster than these.

But since there's no C/C++ functionality for them, they're only accessible via compiler intrinsics.

You can do it manually that way:

n & (n - 1) is a technique to remove the rightmost set bit.

So, n - (n & n - 1) will return a number with only the rightmost set bit.

then a 'log2' of the result give the solution: this link may help

You may alternatively use a switch case with all 1 << k will give you the solution

switch (n - (n & n - 1)) {
    case 0: ...
    case 1 << 0: return 0;
    case 1 << 1: return 1;
    ...
}

edited Feb 07 '14 at 10:13

answered Feb 07 '14 at 09:40

Jarod42

203,559
14
181
302

Your Answer is clearer than neverhoodboy's answer, but basically the same. Do you happen to know how the switch statement is evaluated (on machine code level)? It might turn out to be only as fast or even slower as the original solution (which had to be improved). – Ronald Feb 07 '14 at 09:47
@Ronald: I originally post this answer to explain neverhoodboy's one. I found some duplicate on SO, I will try to find the better manual technique now. – Jarod42 Feb 07 '14 at 09:50
1

I was quite happy with your answer, because I didn't know the trick `n & (n - 1)`. Anyway, the fastest portable way will probably be something like: test bytewise first and as soon as you get a hit, test the byte for set bits. That would reduce the number of operations from (on average) 32 * (if + add + shift) to 2 * if + 4 * (if + add + shift). So effectively an improvement of factor almost 8. – Ronald Feb 07 '14 at 09:57

neverhoodboy · Accepted Answer · 2014-02-07T11:23:58.317

4

If you want it to be fast, bitscan instruction (bsf, bsr) or bit-twiddling hack is the target to go.

EDIT: The idea of using switch-case table to improve performance is nothing but immature.

edited Feb 07 '14 at 11:23

answered Feb 07 '14 at 09:17

neverhoodboy

1,040
7
13

mmmmmh. 8 = 0x08 gives 0x08 - (0x08 & 0x07) = 0x08 - 0 = 0x08; 10 = 0x0a gives 0x0a - (0x0a & 0x09) = 0x0a - 0x08 = 0x02. So it seems that the first bit set remains. (This is not a proof). I think the code is a bit obscure and needs a proof as a comment. But it would be quite fast. But the mentioned switch(){...} will have 32 or 64 powers of two as the different cases (which makes it quite large). – Ronald Feb 07 '14 at 09:29
@Ronald The first set bit of `number` will be the only set bit of `t`, so you only need 32 or 64 cases in the switch. – neverhoodboy Feb 07 '14 at 09:33
Yep, that's what I stated: 32 or 64 powers of two. But in my opinion that is a quite lengthy switch. Depending on the effective machine code executed for evaluating the switch statement, it might even eat the performance you saved by the elegant way of leaving only the first bit set. – Ronald Feb 07 '14 at 09:42
1

I personally wouldn't recommend replacing an idiomatic readable loop with an obscure calculation followed by a 32- or 64-way switch, unless it's a) well documented, and b) proven to have a substantial and necessary impact on performance. – molbdnilo Feb 07 '14 at 09:44
1

@Ronald If you really want it to be fast, bitscan instruction (`bsf`, `bsr`) is the target to go. – neverhoodboy Feb 07 '14 at 09:46
2

Why so complicated? You could do `number & -number` right? Or is there some obscure standard-based objection to that? – harold Feb 07 '14 at 09:54
@Ronald I realized that switch-case is not a good idea for this problem. Either bitscan instruction or bittwiddle hack is better. The method in the original post isn't that bad either (with bug fixed of course). – neverhoodboy Feb 07 '14 at 10:15
@neverhoodboy: actually we're discussing quite a lot in order to improve some code that does 32 (or 64) iterations and perfectly fits into a processor's first level cache ;-) So I regard this all more from the academic point of view. – Ronald Feb 07 '14 at 10:19
@Ronald I'm new to SO, so I'm not sure if it's ok for me to delete this post. I don't want this immature answer to confuse anyone. The other answers made by Jarod42 and herohuyongtao are much better. – neverhoodboy Feb 07 '14 at 10:23

score 4 · Answer 3 · answered Feb 07 '14 at 09:59

4

Bit Twiddling Hacks offers an excellent collection of, er, bit twiddling hacks, with performance/optimisation discussion attached. For your problem (from that site), you can use multiply-and-lookup strategy:

unsigned int c = number;  // your input number
int r;           // result goes here
static const int MultiplyDeBruijnBitPosition[32] = 
{
  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 
  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((c & -c) * 0x077CB531U)) >> 27];

answered Feb 07 '14 at 09:59

herohuyongtao

49,413
29
133
174

1

Assuming 32 bit int this is the proper answer. – Xarn Feb 07 '14 at 10:26
1

@Xarn Another assumption is 2's complement representation? – neverhoodboy Feb 07 '14 at 11:22

Vlad from Moscow · Answer 4 · 2014-02-07T10:03:26.537

Your code

int cnt=0;
   while (number& 1 ==0)
   {
       cnt++;
       number>>=1;
   }

has a bug. For example if number is equal to 0 then the loop will be infinite.

I would rewrite it the following way

int cnt = 0;
if ( number ) while ( !( number & ( 1 << cnt++ ) ) );

In this case if number is not equal to 0 then the position (cnt) of the set bit will be count starting from 1. Otherwise the position will be equal to 0.

score 0 · Answer 5 · answered Feb 07 '14 at 10:22

I'm not sure the accepted answer is right. I just tested the naive loop versus the (num & -num) solution. They are both the same speed. The naive loop is much less code. I built with:

gcc 4.7.2 from MinGW, on Win 7 gcc test.c -o test.exe -std=c99 -Wall -O2

Here's my code (it probably has some corner case bugs, but I suspect the timings are roughly valid).

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define NUM_NUMS 65536


int find_first_bits(unsigned nums[NUM_NUMS])
{
    int total = 0; // Prevent compiler from optimizing out the code
    for (int j = 0; j < 10000; j++) {
        for (int i = 0; i < NUM_NUMS; i++) {
            switch (nums[i] & -nums[i]) {
                case (1<<0): total += 1; break;
                case (1<<1): total += 2; break;
                case (1<<2): total += 3; break;
                case (1<<3): total += 4; break;
                case (1<<4): total += 5; break;
                case (1<<5): total += 6; break;
                case (1<<6): total += 7; break;
                case (1<<7): total += 8; break;
                case (1<<8): total += 9; break;
                case (1<<9): total += 10; break;
                case (1<<10): total += 11; break;
                case (1<<11): total += 12; break;
                case (1<<12): total += 13; break;
                case (1<<13): total += 14; break;
                case (1<<14): total += 15; break;
                case (1<<15): total += 16; break;
                case (1<<16): total += 17; break;
                case (1<<17): total += 18; break;
                case (1<<18): total += 19; break;
                case (1<<19): total += 20; break;
                case (1<<20): total += 21; break;
                case (1<<21): total += 22; break;
                case (1<<22): total += 23; break;
                case (1<<23): total += 24; break;
                case (1<<24): total += 25; break;
                case (1<<25): total += 26; break;
                case (1<<26): total += 27; break;
                case (1<<27): total += 28; break;
                case (1<<28): total += 29; break;
                case (1<<29): total += 30; break;
                case (1<<30): total += 31; break;
                case (1<<31): total += 32; break;
            }
        }
    }

    return total;
}


int find_first_bits2(unsigned nums[NUM_NUMS])
{
    int total = 0; // Prevent compiler from optimizing out the code
    for (int j = 0; j < 10000; j++) {
        for (int i = 0; i < NUM_NUMS; i++) {
            unsigned mask = 1;
            for (int cnt = 1; cnt <= 32; cnt++, mask <<= 1) {
                if (nums[i] & mask) {
                    total += cnt;
                    break;
                }
            }
        }
    }

    return total;
}


int main() {
    // Create some random data to test
    unsigned nums[NUM_NUMS];
    for (int i = 0; i < NUM_NUMS; i++) {
        nums[i] = rand() + (rand() << 15);
    }

    clock_t start_time, end_time;
    int result;

    start_time = clock();
    result = find_first_bits(nums);
    end_time = clock();
    printf("Time = %.5f, result = %d\n", (end_time - start_time) / (double)(CLOCKS_PER_SEC), result);

    start_time = clock();
    result = find_first_bits2(nums);
    end_time = clock();
    printf("Time = %.5f, result = %d\n", (end_time - start_time) / (double)(CLOCKS_PER_SEC), result);
}

Thank you for taking the action to test. I realized the issue as I wrote in the comments under my post. Actually I was thinking of deleting my answer so as no to confuse anyone...... — neverhoodboy, Feb 07 '14 at 10:26
Could you add test for the solution from everyone's favourite collection of bit hacks? http://graphics.stanford.edu/~seander/bithacks.html (see herohuyongtao's answer below) — Xarn, Feb 07 '14 at 10:27
The DeBruijn multiply yields a solution 60% faster than the naive loop. It takes approximately 11 cycles per number processed. — Andrew Bainbridge, Feb 08 '14 at 00:25
3x faster than the DeBruijin method is Andrew Grant's lookup table method from http://stackoverflow.com/questions/757059/position-of-least-significant-bit-that-is-set — Andrew Bainbridge, Feb 08 '14 at 00:47

finding the first set bit in a binary number

5 Answers5

Linked