2

In C++, I would like to compute the number of bits that is 1 in a uint64_t

# A few examples in uint8_t:
0b00000000  ==> 0
0b01000001  ==> 2
0b00110101  ==> 4
0b11111111  ==> 8

I need to perform a lot (say millions) of such operations, is there a fast way to do so (maybe with some pre-processing)?

One possibility I can think of is to create a lookup table for every N bit (say N=8), and perform 64/N lookups for a uint64_t. Is there a better solution for this? If I decide to go for the above approach, how should I set N? (To my understanding even ignoring all pre-processing computation, it is not the case that the larger N is, the better, Because the look-up speed actually depends on whether the look-up table can fit in cache?)

Thanks a lot!

Ying Xiong
  • 4,578
  • 8
  • 33
  • 69

2 Answers2

5

The fastest way is __builtin_popcountll on GCC and Clang and _mm_popcnt_u64 for VC++.
When compiled with appropriate flags (e.g. -march=native), this should yield optimal performance as the intrinsic is reduced to the assembler instruction POPCNT that was introduced together with SSE 4.2. Demo.

And if that's not applicable try a Bit twiddling Hack using lookup tables.

Columbo
  • 60,038
  • 8
  • 155
  • 203
2

There's the popcnt (population count) instruction which is very likely the fastest way to get the number of 1 bits.
Note that your processor must support SSE4.2.
You can use both intrinsics and builtin function, but I suggest intrinsics function because they're supported by all major compilers:

#include "nmmintrin.h"
auto bitsOn = _mm_popcnt_u64(10);
edmz
  • 8,220
  • 2
  • 26
  • 45