Counting bits of ones in Byte by time Complexity O(1) C++ code

Question

I've searched an algorithm that counts the number of ones in Byte by time complexity of O(1) and what I found in google:

// C++ implementation of the approach  
#include <bits/stdc++.h> 
using namespace std; 
  
int BitsSetTable256[256]; 
  
// Function to initialise the lookup table  
void initialize()  
{  
  
    // To initially generate the  
    // table algorithmically  
    BitsSetTable256[0] = 0;  
    for (int i = 0; i < 256; i++) 
    {  
        BitsSetTable256[i] = (i & 1) +  
        BitsSetTable256[i / 2];  
    }  
}  
  
// Function to return the count  
// of set bits in n  
int countSetBits(int n)  
{  
    return (BitsSetTable256[n & 0xff] +  
            BitsSetTable256[(n >> 8) & 0xff] +  
            BitsSetTable256[(n >> 16) & 0xff] +  
            BitsSetTable256[n >> 24]);  
}  
  
// Driver code  
int main()  
{  
    // Initialise the lookup table  
    initialize();  
    int n = 9;  
    cout << countSetBits(n); 
}

I understand what I need 256 size of the array (in other words size of the look up table) for indexing from 0 to 255 which they are all the decimals value that Byte represents !

but in the function initialize I didn't understand the terms inside the for loop: BitsSetTable256[i] = (i & 1) + BitsSetTable256[i / 2];
Why Im doing that?! I didn't understand what's the purpose of this row code inside the for loop.

In addition , in the function countSetBits , this function returns:

 return (BitsSetTable256[n & 0xff] +  
            BitsSetTable256[(n >> 8) & 0xff] +  
            BitsSetTable256[(n >> 16) & 0xff] +  
            BitsSetTable256[n >> 24]);

I didn't understand at all what Im doing and bitwise with 0xff and why Im doing right shift .. may please anyone explain to me the concept?! I didn't understand at all why in function countSetBits at BitsSetTable256[n >> 24] we didn't do and wise by 0xff ?

I understand why I need the lookup table with size 2^8 , but the other code rows that I mentioned above didn't understand, could anyone please explain them to me in simple words? and what's purpose for counting the number of ones in Byte?

thanks alot guys!

but this algorithm from google, and Im trying to understand it, and yeah it's O(1) time complexity — LiamLony, Sep 08 '20 at 12:56
If this is for real world use, rather than just an academic exercise, Intel added the `popcnt` instruction in the Nehalem architecture. — Colin, Sep 08 '20 at 13:02
If you want fast use [popcnt](https://en.wikipedia.org/wiki/SSE4#POPCNT_and_LZCNT) to do it in a single instruction — Alan Birtles, Sep 08 '20 at 13:13
For your question of why this count is interesting. It is called the Hamming weight, and wikipedia can tell you more about the various uses: https://en.wikipedia.org/wiki/Hamming_weight — Frodyne, Sep 08 '20 at 13:16
And here: https://stackoverflow.com/questions/109023/how-to-count-the-number-of-set-bits-in-a-32-bit-integer is more discussion on how to calculate it efficiently than you can shake a stick at. — Frodyne, Sep 08 '20 at 13:18
Do not overthink this since compiler is smarter then you can expact, see https://youtu.be/bSkpMdDe4g4 https://godbolt.org/z/dYdMdY — Marek R, Sep 08 '20 at 13:21
C++20 added [`std::popcount`](https://en.cppreference.com/w/cpp/numeric/popcount). Although it does not make any guarantees, it stands to reason that the implementation in the standard library might have competitive performance. If in doubt benchmark. — bitmask, Sep 08 '20 at 14:32

Scheff's Cat · Answer 1 · 2020-09-08T14:52:45.013

Concerning the first part of question:

// Function to initialise the lookup table  
void initialize()  
{  
  
    // To initially generate the  
    // table algorithmically  
    BitsSetTable256[0] = 0;  
    for (int i = 0; i < 256; i++) 
    {  
        BitsSetTable256[i] = (i & 1) +  
        BitsSetTable256[i / 2];  
    }  
}

This is a neat kind of recursion. (Please, note I don't mean "recursive function" but recursion in a more mathematical sense.)

The seed is BitsSetTable256[0] = 0;

Then every element is initialized using the (already existing) result for i / 2 and adds 1 or 0 for this. Thereby,

1 is added if the last bit of index i is 1
0 is added if the last bit of index i is 0.

To get the value of last bit of i, i & 1 is the usual C/C++ bit mask trick.

Why is the result of BitsSetTable256[i / 2] a value to built upon? The result of BitsSetTable256[i / 2] is the number of all bits of i the last one excluded.

Please, note that i / 2 and i >> 1 (the value (or bits) shifted to right by 1 whereby the least/last bit is dropped) are equivalent expressions (for positive numbers in the resp. range – edge cases excluded).

Concerning the other part of the question:

    return (BitsSetTable256[n & 0xff] +  
            BitsSetTable256[(n >> 8) & 0xff] +  
            BitsSetTable256[(n >> 16) & 0xff] +  
            BitsSetTable256[n >> 24]);

n & 0xff masks out the upper bits isolating the lower 8 bits.
(n >> 8) & 0xff shifts the value of n 8 bits to right (whereby the 8 least bits are dropped) and then again masks out the upper bits isolating the lower 8 bits.
(n >> 16) & 0xff shifts the value of n 16 bits to right (whereby the 16 least bits are dropped) and then again masks out the upper bits isolating the lower 8 bits.
(n >> 24) & 0xff shifts the value of n 24 bits to right (whereby the 24 least bits are dropped) which should make effectively the upper 8 bits the lower 8 bits.

Assuming that int and unsigned have usually 32 bits on nowadays common platforms this covers all bits of n.

Please, note that the right shift of a negative value is implementation-defined.

(I recalled Bitwise shift operators to be sure.)

So, a right-shift of a negative value may fill all upper bits with 1s.
That can break BitsSetTable256[n >> 24] resulting in (n >> 24) > 256
and hence BitsSetTable256[n >> 24] an out of bound access.

The better solution would've been:

    return (BitsSetTable256[n & 0xff] +  
            BitsSetTable256[(n >> 8) & 0xff] +  
            BitsSetTable256[(n >> 16) & 0xff] +  
            BitsSetTable256[(n >> 24) & 0xff]);

Exactly. The initialization of the lookup table uses a recurrence relation of the bit count. — Codor, Sep 08 '20 at 13:10
for i&1 I understand, still concerning about BitsSetTable256 [i/2] , why didn't useBitsSetTable256[ i-1] or BitsSetTable256[i/3] or BitsSetTable256 [i %2] ..why exactly i/2? I didn't get the point why I need to divide the index by 2 ? who said that if we divide the index by 2 then we get all the number of ones till index i excluded last one? appreciated for any help — LiamLony, Sep 08 '20 at 13:24
The argument is a bit involved. The division by two is the same as a shift to the right. It removes the least significant bit (which is taken care of by the other expression) and reads the number of bits in the shifted value - which "just happens" to be _smaller_, so the lookup table already contains the desired value inductively. — Codor, Sep 08 '20 at 13:36
This type of "recursion" is also known as [dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming). — bitmask, Sep 08 '20 at 14:36
@bitmask I somehow missed dynamic programming when I was a student. (I admit that's some decades ago.) I always tried to close up but was uncertain whether dynamic programming is either something quite obvious or something very contrived I'm unable to catch... ;-) — Scheff's Cat, Sep 08 '20 at 14:42
@Scheff Let me help you out then; It's the former. In 99% of the cases it's just a fancy word for "shuffle the order of computation around in a clever enough fashion that you can use table-assist for the more complicated problem-instances using the simpler ones". — bitmask, Sep 08 '20 at 14:46
@Scheff I would totally second that; in my own studies, I always felt that dynamic programming was somehow badly explained. — Codor, Sep 08 '20 at 18:14

score 3 · Answer 2 · answered Sep 08 '20 at 13:25

BitsSetTable256[0] = 0;
...
BitsSetTable256[i] = (i & 1) +  
     BitsSetTable256[i / 2];

The above code seeds the look-up table where each index contains the number of ones for the number used as index and works as:

(i & 1) gives 1 for odd numbers, otherwise 0.
An even number will have as many binary 1 as that number divided by 2.
An odd number will have one more binary 1 than that number divided by 2.

Examples:

if i==8 (1000b) then (i & 1) + BitsSetTable256[i / 2] ->
0 + BitsSetTable256[8 / 2] = 0 + index 4 (0100b) = 0 + 1 .
if i==7 (0111b) then 1 + BitsSetTable256[7 / 2] = 1 + BitsSetTable256[3] = 1 + index 3 (0011b) = 1 + 2.

If you want some formal mathematical proof why this is so, then I'm not the right person to ask, I'd poke one of the math sites for that.

As for the shift part, it's just the normal way of splitting up a 32 bit value in 4x8, portably without care about endianess (any other method to do that is highly questionable). If we un-sloppify the code, we get this:

BitsSetTable256[(n >>  0) & 0xFFu] +  
BitsSetTable256[(n >>  8) & 0xFFu] +  
BitsSetTable256[(n >> 16) & 0xFFu] +  
BitsSetTable256[(n >> 24) & 0xFFu] ;

Each byte is shifted into the LS byte position, then masked out with a & 0xFFu byte mask.

Using bit shifts on int is however code smell and potentially buggy. To avoid poorly-defined behavior, you need to change the function to this:

#include <stdint.h>

uint32_t countSetBits (uint32_t n);

Codor · Answer 3 · 2020-09-08T13:30:04.500

The code in countSetBits takes an int as an argument; apparently 32 bits are assumed. The implementation there is extracting four single bytes from n by shifting and masking; for these four separated bytes, the lookup is used and the number of bits per byte there are added to yield the result.

The initialization of the lookup table is a bit more tricky and can be seen as a form of dynamic programming. The entries are filled in increasing index of the argument. The first expression masks out the least significant bit and counts it; the second expression halves the argument (which could be also done by shifting). The resulting argument is smaller; it is then correctly assumed that the necessary value for the smaller argument is already available in the lookup table.

For the access to the lookup table, consider the following example:

input value (contains 5 ones):
01010000 00000010 00000100 00010000

input value, shifting is not necessary
masked with 0xff (11111111)
00000000 00000000 00000000 00010000     (contains 1 one)

input value shifted by 8
00000000 01010000 00000010 00000100
and masked with 0xff (11111111)
00000000 00000000 00000000 00000100     (contains 1 one)

input value shifted by 16
00000000 00000000 01010000 00000010
and masked with 0xff (11111111)
00000000 00000000 00000000 00000010     (contains 1 one)

input value shifted by 24,
masking is not necessary
00000000 00000000 00000000 01010000     (contains 2 ones)

The extracted values have only the lowermost 8 bits set, which means that the corresponding entries are available in the lookup table. The entries from the lookuptable are added. The underlying idea is that the number of ones in in the argument can be calculated byte-wise (in fact, any partition in bitstrings would be suitable).

Might be worth mentioning that `24 + 8 = 32` and since we're assuming the `int` is 32 bits, `n >> 24` won't exceed 256, which is why there's no bitmask on that table lookup. (since this is something OP specifically asked about in their question) — scohe001, Sep 08 '20 at 13:00
That function is actually incorrect when passed a negative value (`n >> 24` may be negative, depending on platform). It should have taken a `uint32_t` parameter type. — interjay, Sep 08 '20 at 13:04
Hi sorry didn't understand, may please by an example explain to me why Im doing right shifting and why on the look up table Im dividing the index i by 2? still blurred and confused .. lets assume n=1 , this means for Byte it's 00000001 , so I need lookup table with 2^8 indexes , so afterwards what the code does?! thanks alot for any clarifications, I know it's counting the number of ones, but how and why I need the right shifting ? — LiamLony, Sep 08 '20 at 13:09

Counting bits of ones in Byte by time Complexity O(1) C++ code

3 Answers3