The solutions the other guys proposed are very nice and probably the shortest to write and remain understandable. Another straight forward approach would be something like this
int bitCountLinear(long int n) {
int len = sizeof(long int)*8;
for (int i = 0; i < len; ++i)
if ((1UL<<i) > (unsigned long int)n)
return i;
return len;
}
The rest might get a bit extreme but I gave it a try so I'll share it.
I suspected that there might be arguably faster methods of doing this. eg Using binary search (even though a length of 64bits is extremely small). So I gave it a quick try for you and for the fun of it.
union long_ing_family {
unsigned long int uli;
long int li;
};
int bitCountLogN(long int num) {
union long_ing_family lif;
lif.li = num;
unsigned long int n = lif.uli;
int res;
int len = sizeof(long int)*8-1;
int max = len;
int min = 0;
if (n == 0) return 0;
do {
res = (min + max) / 2;
if (n < 1UL<<res)
max = res - 1;
else if (n >= (1UL<<(res+1)))
min = res + 1;
else
return res+1;
} while (min < max);
return min+1; // or max+1
}
I then timed both to see if they have any interesting differences...
#include <stdio.h>
#define REPS 10000000
int bitCountLinear(long int n);
int bitCountLogN(long int num);
unsigned long int timestamp_start(void);
unsigned long int timestamp_stop(void);
union long_ing_family;
int main(void) {
long int n;
long int begin, end;
long int begin_Lin, end_Lin;
long int begin_Log, end_Log;
begin_Lin = 0;
end_Lin = 0;
begin_Log = 0;
end_Log = 0;
for (int i = 0; i < REPS; ++i) {
begin_Lin += timestamp_start();
bitCountLinear(i);
end_Lin += timestamp_stop();
}
printf("Linear: %lu\n", (end_Lin-begin_Lin)/REPS);
for (int i = 0; i < REPS; ++i) {
begin_Log += timestamp_start();
bitCountLogN(i);
end_Log += timestamp_stop();
}
printf("Log(n): %lu\n", (end_Log-begin_Log)/REPS);
}
unsigned long int timestamp_start(void) {
unsigned int cycles_low;
unsigned int cycles_high;
asm volatile ("CPUID\n\t"
"RDTSCP\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t": "=r" (cycles_high), "=r" (cycles_low)::"%rax", "%rbx", "%rcx", "%rdx");
return ((unsigned long int)cycles_high << 32) | cycles_low;
}
unsigned long int timestamp_stop(void) {
unsigned int cycles_low;
unsigned int cycles_high;
asm volatile ("RDTSCP\n\t"
"mov %%edx, %0\n\t"
"mov %%eax, %1\n\t"
"CPUID\n\t": "=r" (cycles_high), "=r" (cycles_low)::"%rax", "%rbx", "%rcx", "%rdx");
return ((unsigned long int)cycles_high << 32) | cycles_low;
}
...and not surprisingly they didn't.
On my machine I'll get numbers like
Linear: 228
Log(n): 224
Which are not considered to be different assuming a lot of background noise.
Edit:
I realized that I only tested the fastest solutions for the Linear approach so changing the function inputs to
bitCountLinear(0xFFFFFFFFFFFFFFFF-i);
and
bitCountLogN(0xFFFFFFFFFFFFFFFF-i);
On my machine I'll get numbers like
Linear: 415
Log(n): 269
Which is clearly a win for the Log(n) method. I didn't expect to see a difference here.