You can count the number of set bits in a long
as follows:
long l = 1425142514251425142L; // some value
long c = l;
c = ((c >> 1) & 0x5555555555555555L) + (c & 0x5555555555555555L);
c = ((c >> 2) & 0x3333333333333333L) + (c & 0x3333333333333333L);
c = ((c >> 4) & 0x0f0f0f0f0f0f0f0fL) + (c & 0x0f0f0f0f0f0f0f0fL);
c = ((c >> 8) & 0x00ff00ff00ff00ffL) + (c & 0x00ff00ff00ff00ffL);
c = ((c >> 16) & 0x0000ffff0000ffffL) + (c & 0x0000ffff0000ffffL);
c = ((c >> 32) & 0x00000000ffffffffL) + (c & 0x00000000ffffffffL);
We basically perform O(n) times, with n the number of bits, a similar operation. For the i
'th step (with i
starting from 1
), we perform an operation like:
c = ((c >> 2i) & mask) + (c & mask);
The mask has as binary structure:
0101010101010101
0011001100110011
0000111100001111
0000000011111111
...
So for the i
-th step it is a repitition of 2i
zeros, followed by 2i
ones, and this is repeated until we reach the 64 bits.
How does this work? By shifting a 2i
places to the right, we "align" two parts of the number. The first part is the ones that are placed where the mask
has zeros, and the other part where the mask has ones. We then sum up the two.
In the first step this means that for every two bits, we align the bits to the right, sum these up, and the result of this sum (a value between 0
and 2
, both inclusive), can be represented on two bits. So now c
contains a sequence of 32 2-bit numbers that each represent the sum of the number of two bits.
In the next iteration, we again perform an alignment, and now we sum up 16 of those 2-bit numbers with their neighbor on the left (so the other 16 2-bit numbers), which can result in values from 0
to 4
, so we can represent that number of 3 bits, but we use space for 4 bits.
Each iteration we thus sum up 2i-1
numbers with other 2i
, and after O(log n), we finally sum up two n/2 bit numbers, which yield the total number of set bits.
We here make the assumption that we can add two numbers together in constant time, as well as shifting and masking in constant time. If the numbers are arbitrary large, that is not the case, hence the algorithm is, strictly speaking not O(log n), in fact for arbitary large numbers, this algorithm is even worse.
That being said, you can not count arbitrary long algorithms faster than Ω(n), since it requires at least to read over each bit once to determine its value, unless of course you know something about the structure of the number in advance that you could exploit.