4

I know this has been asked before, but I'm looking at this particular solution listed here:

int BitCount(unsigned int u)
{
     unsigned int uCount;

     uCount = u - ((u >> 1) & 033333333333) - ((u >> 2) & 011111111111);
     return ((uCount + (uCount >> 3)) & 030707070707) % 63;
}

How does it work?

Are there any caveats involved here?

Theoretically is it possible to find the answer in constant time? I mean don't we actually have to iterate through the bits to count?

Community
  • 1
  • 1
batman
  • 5,022
  • 11
  • 52
  • 82
  • 2
    Unless you're dealing with multi-precision numbers, iterating through the bits is constant time --- the constant is the size in bits of integers. – Barmar Nov 01 '13 at 15:09
  • 3
    Well technically an unsigned int is 4 bytes and if you always iterate through all 32 bits in the int regardless of N, then *technically* it is O(1). Remember that O(1) means that the time complexity doesn't change with the value of N. – Shashank Nov 01 '13 at 15:09
  • 1
    Caveat is that this only works with 32bit input (unsigned integer). So, the input is already known and of constant length, so some fancy bit arithmetic can give the answer. If it is a none constant length, than iteration is almost inevitable. – Benjamin Trent Nov 01 '13 at 15:13
  • Note that if N has no upper bound then the algorithm must be O(log N). – Shashank Nov 01 '13 at 15:13
  • While that is an interesting and perhaps instructive solution, the modulo by a non-power-of-two generally makes it slower than several alternatives, such as [this one](http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel) or several table-lookups (typically with a 256-entry table). – harold Nov 01 '13 at 15:15
  • possible duplicate of [How to count the number of set bits in a 32-bit integer?](http://stackoverflow.com/questions/109023/how-to-count-the-number-of-set-bits-in-a-32-bit-integer) – templatetypedef Nov 01 '13 at 15:30
  • By the way, what sort of complexity model do you want to consider? In PRAM for example you can clearly do it in less than O(n) time for an n-bit integer, and while you wouldn't actually *iterate* through all bits, you would look at all of them (largely in parallel). – harold Nov 01 '13 at 15:40
  • this might interest you: http://jayaramprabhudurairaj.com/?p=348 (from that explanation "...instead of looping over the entire number, sum up the number in blocks of 3 (octal) and count them in parallel...") – גלעד ברקן Nov 01 '13 at 22:58
  • Caveat: `unsigned int` per the C spec is only at _least_ 16 bits. Example: Should a platform use a 64-bit `unsigned int`, this routine would provide incorrect results. Better to use `uint32_t` or constants and algorithm that scale with the `sizeof(unsigned int)`. – chux - Reinstate Monica Nov 02 '13 at 07:42

4 Answers4

11

Counting bits

An unsigned 32-bit integer u can be written like this:

u = a31 * 231 + a30 * 230 + ... + a0 * 20

We want the value of a31 + a30 + ... + a0.

Let's compare the values of u >> k:

u >> 0  = a31 * 231 + a30 * 230 + ... + a1 * 21 + a0 * 20
u >> 1  = a31 * 230 + a30 * 229 + ... + a1 * 20
u >> 2  = a31 * 229 + a30 * 228 + ...
...
u >> 29 = a31 * 22  + a29 * 21  + ...
u >> 30 = a31 * 21  + a30 * 20 
u >> 31 = a31 * 20 

We will calculate the bit population by this formula:

u >> 0 - u >> 1 - u >> 2 - ... - u >> 31 = p

Let's see why this works:

  u >> 0 - u >> 1 - u >> 2 - ... - u >> 31
= u >> 0 - (u >> 1 + u >> 2 + ... + u >> 31)
= u - q

What's the value of q? Let's calculate it bit by bit, looking at the values for u >> k above. For a31, it's:

  a31 * 230 + a31 * 229 + ...
= a31 * (230 + 229 + ...)
= a31 * (231 - 1)

Or for a30:

  a30 * 229 + a30 * 228 + ...
= a30 * (229 + 228 + ...)
= a30 * (230 - 1)

We find: q = a31 * (231 - 1) + a30 * (230 - 1) + ...

And thus

u - q = a31 * 231 - a31 * (231 - 1) + ...
      = a31 + a30 + ... + a0

Counting bits in 3-bit blocks

This algorithm starts by doing the same thing, but in blocks of 3 bits:

u >> 0                = AaBbbCccDddEeeFffGggHhhIiiJjjKkk (each letter is a bit)
u >> 1 & 033333333333 =  A Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk (blank = zero)
u >> 2 & 011111111111 =     B  C  D  E  F  G  H  I  J  K

Taking the difference of these, by the above algorithm, each octet in uCount contains the number of bits set in the corresponding octet in u.

uCount      =   αβγδεζηθικλ (each greek letter is an octet)
uCount >> 3 =    αβγδεζηθικ

So uCount + (uCount >> 3) is (λ+κ) * 20 + (κ+ι) * 23 + (ι+θ) * 26 + ...

By ANDing with 0o30707070707, we mask away every other octet, so that we only count each pair once:

r = (λ+κ) *  20 + (ι+θ) *  26 + (η+ζ) *  212 + ...
  = (λ+κ) * 640 + (ι+θ) * 641 + (η+ζ) * 642  + ...

This is a base-64 number, and we want to sum up the base-64 digits to get α+β+γ+δ+ε+ζ+η+θ+ι+κ+λ, our final result. To do this, we calculate its base-64 digital root: knowing that the result can never be bigger than 32, we just modulo the number by 63.

Lynn
  • 10,425
  • 43
  • 75
5

Iterating the bits is constant time since the number of bits in a type is constant.

So a solution that checks a mask of one bit and shifts for each bit in the target value is indeed O(1) (e.g. where the constant is 32).

maerics
  • 151,642
  • 46
  • 269
  • 291
  • 1
    I didn't downvote, and your answer is right, but I think he meant efficient time instead of O(1) complexity. The question is indeed badly worded, but it seems he just wanted to understand how this is done without loops. – Leeor Nov 01 '13 at 17:20
  • i would say O(1) time is meant to leave out solutions that loop through the number of bits (i.e O(n) time) including low-level such solutuons or routines, so i would take it to be solutins which only use binary arithmetic **as a whole entity** and **not per bit for each bit**. Incidentaly i wanted to ask exactly same question but to find the position of (most significant) set bit (i.e [binary logarithm in O(1)](http://www.akalin.cx/constant-time-mssb)) – Nikos M. Mar 02 '15 at 12:07
5

The fastest way to do this is popcnt instruction. You can access it through compiler intrinsic usually. Your solution can be useful on platforms that lack this instruction.

Evgeny Lazin
  • 9,193
  • 6
  • 47
  • 83
1

Counting bits set, in parallel shows how this is done. That method can be used for 8-, 16-, 32-, 64-, 128-, etc. bit words, although the constants used in the calculations change.

When we say that this operation is O(1), we mean that it can be done in constant time regardless of the word size. Counting the bits the naive way is O(n) in the number of bits.

Speaking practically, this is only O(1) when the processor can work with the word size natively.

As for how it works, it uses "magic numbers." See this newsgroup post for an explanation.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351