It works because you can count the total number of set bits by dividing in two halves, counting the number of set bits in both halves and then adding them up. Also know as Divide and Conquer
paradigm. Let's get into detail..
v = v - ((v >> 1) & 0x55555555);
The number of bits in two bits can be 0b00
, 0b01
or 0b10
. Lets try to work this out on 2 bits..
---------------------------------------------
| v | (v >> 1) & 0b1010 | v - x |
---------------------------------------------
0b00 0b00 0b00
0b01 0b00 0b01
0b10 0b01 0b01
0b11 0b01 0b10
This is what was required, the last column shows the count of set bits in every two bit pair. If the two bit number is >= 2 (0b10)
then and
produces 0b01
, else it produces 0b00
.
v = (v & 0x33333333) + ((v >> 2) & 0x33333333);
This statement should be easy to understand. After the first operation we have the count of set bits in every two bits, now we sum up that count in every 4 bits.
v & 0b11001100 //masks out even two bits
(v >> 2) & 0b11001100 // masks out odd two bits
We then sum up the above result, giving us the total count of set bits in 4 bits.The last statement is the most tricky.
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
Let's break it further..
v + (v >> 4)
Its similar to second statement, we are counting the set bits in groups of 4 instead. We know, because of our previous operations, that every nibble has the count of set bits in it. Lets look an example, suppose we have the byte 0b01000010
. It means the first nibble has 4bits set and the second one has 2bits set. Now we add those nibbles together.
0b01000010 + 0b01000000
It gives us the count of set bits in a byte, in the first nibble 0b01100010
and therefore we mask the last four bytes of all the bytes in the number(discarding them).
0b01100010 & 0xF0 = 0b01100000
Now every byte has the count of set bits in it. We need to add them up all together. The trick is to multiply the result by 0b10101010
which has an interesting property. If our number has four bytes, A B C D
, it will result in a new number with these bytes A+B+C+D B+C+D C+D D
. A 4 byte number can have maximum of 32 bits set, which can be represented as 0b00100000
.
All we need now is the first byte which has the sum of all set bits in all the bytes, and we get it by >> 24
. This algorithm was designed for 32 bit
word but can be easily modified for 64 bit
word.