0

What would be the best way to identify all the set bit positions in a 64 bit bitmask. Suppose my bit mask is 0xDeadBeefDeadBeef, then what is the best way, to identify all the bit positions of the set bits in it.

long long bit_mask = 0xdeadbeefdeadbeef;
unsigned int bit_pos=0;
while(mask) {
  if((mask&1)==1) {
     printf("Set bit position is:%d \n",bit_pos};
  }
  bit_pos++;
  mask>>=1; 
}

One way is to loop through it, and check if a bit is set or not, if it is set, Return the count position and continue looping until the MSB, so for 64 bits, I would iterate until I have all the set bits traversed or all 64 bits traversed, if MSB is set, but there must be a better way of doing it?

Viks
  • 760
  • 4
  • 12
  • 26
  • 2
    Do you want a _count_ of the number of bits set? Or some binaray printout like "1101 1110 1010 1101 ...". – chux - Reinstate Monica Oct 14 '13 at 17:32
  • 1
    Do you have access to a "count trailing zeroes" or "popcnt" instruction? – harold Oct 14 '13 at 17:38
  • 1
    are you trying to count the number of ones or are you interested in the positions of the ones? – old_timer Oct 15 '13 at 12:51
  • 1
    Since the example algorithm offered stops when it encounters the first set bit, I think it's the position that's desired, not the total number of set bits. It would also make sense given the context offered by [another of the poster's questions](http://stackoverflow.com/questions/19236657/how-to-randomly-pick-a-value-based-on-the-position-of-the-bit). Apparently the positions of the set bits is desired so one can be selected randomly. – D Krueger Oct 15 '13 at 15:56

5 Answers5

2

Algorithm from Hacker's Delight (book):

int count_bits(long long s)
{
    s = (s&0x5555555555555555L) + ((s>>1)&0x5555555555555555L);
    s = (s&0x3333333333333333L) + ((s>>2)&0x3333333333333333L);
    s = (s&0x0F0F0F0F0F0F0F0FL) + ((s>>4)&0x0F0F0F0F0F0F0F0FL);
    s = (s&0x00FF00FF00FF00FFL) + ((s>>8)&0x00FF00FF00FF00FFL);
    s = (s&0x0000FFFF0000FFFFL) + ((s>>16)&0x0000FFFF0000FFFFL);
    s = (s&0x00000000FFFFFFFFL) + ((s>>32)&0x00000000FFFFFFFFL);

    return (int)s;
}
X.J
  • 662
  • 3
  • 6
1

Besides already explained nice bit twiddling hacks, there are other options.

This assumes that you have x86(64), SSE4, gcc and compile with -msse4 switch you can use:

int CountOnesSSE4(unsigned int x)
{
    return __builtin_popcount(x);
}

This will compile into single popcnt instruction. If you need fast code you can actually check for SSE at runtime and use best function available.

If you expect number to have small number of ones, this could also be fast (and is always faster than the usual shift and compare loop):

int CountOnes(unsigned int x)
{
    int cnt = 0;
    while (x) {
        x >>= ffs(x);
        cnt++;
    }
    return cnt;
}

On x86 (even without SSE) ffs will compile into single instruction (bsf), and number of loops will depend on number of ones.

dbrank0
  • 9,026
  • 2
  • 37
  • 55
0

You might do this:

long long bit_mask = 0xdeadbeefdeadbeef;

int i;
for (i = 0; i < (sizeof(long long) * 8); i++) {
    int res = bit_mask & 1;
    printf ("Pos %i is %i\n", i, res);
    bit_mask >>= 1;

}
Atle
  • 1,867
  • 12
  • 10
  • 2
    you should compile and run that and see if you are happy with the result. – Charlie Burns Oct 14 '13 at 17:17
  • Yes, But it means, it will traverse through all the 0's and 1's, is there a way, which provides a better way of doing it? – Viks Oct 14 '13 at 17:17
  • Sorry. I fixed my errors (it's now bitshifting the correct way, and it uses `int res` to fit the `%i`). – Atle Oct 14 '13 at 17:49
0

It depends if you want clarity in your code or a very fast result. I almost always choose clarity in the code unless profiling tells me otherwise. For clarity, you might do something like:

int count_bits(long long value) {
    int n = 0;
    while(value) {
        n += (value & 1);
        value >>= 1;
   }
   return n;
}

For performance you might want to call count_bits from X J's answer.

int count_bits(long long s)
{
    s = (s&0x5555555555555555L) + ((s>>1)&0x5555555555555555L);
    s = (s&0x3333333333333333L) + ((s>>2)&0x3333333333333333L);
    s = (s&0x0F0F0F0F0F0F0F0FL) + ((s>>4)&0x0F0F0F0F0F0F0F0FL);
    s = (s&0x00FF00FF00FF00FFL) + ((s>>8)&0x00FF00FF00FF00FFL);
    s = (s&0x0000FFFF0000FFFFL) + ((s>>16)&0x0000FFFF0000FFFFL);
    s = (s&0x00000000FFFFFFFFL) + ((s>>32)&0x00000000FFFFFFFFL);

    return (int)s;
}

It depends if you want to look through your code and say to yourself, "yeah, that makes sense" or "I'll take that guy's word it".

I've been called out on this before in stack overflow. Some people do not agree. Some very smart people choose complexity over simplicity. I believe clean code is simple code.

If performance calls for it, use complexity. If not, don't.

Also, consider a code review. What are you going to say when someone says "How does count_bits work?"

Charlie Burns
  • 6,994
  • 20
  • 29
0

If you are counting the ones you can use the hackers delight solution which is fast, but a lookup table can be (isnt always) faster. And much more understandable. You could pre-prepare a table for example 256 items deep that represent the counts for the byte values 0x00 to 0xFF

0, //0x00
1, //0x01
1, //0x02
2, //0x03
1, //0x04
2, //0x05
2, //0x06
3, //0x07
...

The code to build that table would likely use the slow step through every bit approach.

Once built though you can break your larger number into bytes

count  = table8[number&0xFF]; number>>=8;
count += table8[number&0xFF]; number>>=8;
count += table8[number&0xFF]; number>>=8;
count += table8[number&0xFF]; number>>=8;
...

if you have more memory you can make the table even bigger by representing wider numbers, a 65536 deep table for the numbers 0x0000 to 0xFFFF.

count  = table16[number&0xFFFF]; number>>16;
count += table16[number&0xFFFF]; number>>16;
count += table16[number&0xFFFF]; number>>16;
count += table16[number&0xFFFF]; number>>16;

Tables are a general way to make things like this faster at the expense of memory consumption. The more memory you are able to consume the more you can pre-compute (at or before compile time) rather than real-time compute.

old_timer
  • 69,149
  • 8
  • 89
  • 168