Which code performs better in terms of Time-complexity?

Question

Below are two java codes that find the number of 1 bits in an integer N.

code 1:

int count = 0;

while (N != 0) {
    N = N & (N - 1);
    count++;
}

return count;

code 2:

int i = N;
i = i - ((i >>> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
i = (i + (i >>> 4)) & 0x0f0f0f0f;
i = i + (i >>> 8);
i = i + (i >>> 16);
return i & 0x3f;

As far as I understand, the first approach has a complexity of O(N), while the second one would be O(1). So second one is supposed to be better.

However I am not sure about it.

http://stackoverflow.com/questions/109023/how-to-count-the-number-of-set-bits-in-a-32-bit-integer — AdamSkywalker, Jul 06 '16 at 09:54
`O` notation is about the *worst case*. *Best* case for code 1 is `N = 0`, which is when code 1 will be faster than code 2, but the worst case is for `N = 1111...1111b`. — JimmyB, Jul 06 '16 at 10:09
The second code is only O(1) because you are doing all the work all the time, not matter how big (or small) N is. That does not mean it's faster. Also, it only works for `int`s and would have to be extended for `long` or other types with more digits. — tobias_k, Jul 06 '16 at 10:11

Paul Hankin · Accepted Answer · 2016-07-06T11:13:39.613

You have to take some care when talking about time complexity. Big-O provides a measure of asymptotic runtimes, which means you need to measure runtimes for arbitrarily large N. Most programming languages do not provide (by default) int types that are arbitrarily large so usually we fudge the issue a bit and pretend that they do.

The first code snippet works if we pretend int can be as large as possible. It runs one loop per 1 in the binary representation of N, so in the worst case takes log_2(N) iterations. So it's O(log N).

The second code snippet doesn't work (that is, it produces the wrong results) if int is larger than 32 bits. If we imagine int getting larger to support the larger Ns we need in an asymptotic analysis, then we'd need more lines in the program to support the extra bits needed to represent N. Because the program has to change to be correct, an asymptotic analysis isn't really possible. Any particular version of the program runs in O(1) time, but it doesn't produce correct results for large enough N, so it's not a fair comparison against the first code snippet.

You can try to fix the second program to adapt to the size of N by instead of having hardcoded shifts and adds, by using a loop. Then I think it runs in O(log(log(N))) time since each additional shift doubles the number of bits that get aggregated.

One important thing to bear in mind is that we're assuming that shifts and bitwise operations are still O(1) no matter that we're making int larger and larger. That's an assumption that is quite common to make in these analyses, but it's not something that's true on actual computers where machine instructions work on 4 or 8 byte quantities only.

Dynamic version of algorithm 2

Your question was Java, but here's a correct O(log(log(N)) version of algorithm 2 in Python (which does have arbitrarily large ints). You can treat it as pseudo-code if you don't know Python -- but I think this is relatively readable anyhow, and converting it to use java bignums would make it harder to understand. It computes k, the number of bits needed to represent N, rounded up to a power of 2, and then the constructs a list of the shifts and masks necessary to count the bits. This takes O(log(k)) time, and since k=log(N), the overall algorithm takes O(log(log(N)) time (where here time means bitwise or arithmetic operations).

def generate_ops(k):
    ops = []
    mask = (1 << k) - 1
    while k:
        ops.append((k, mask))
        k >>= 1
        mask = mask ^ (mask << k)
    return reversed(ops)

def bit_count(N):
    k = 1
    while (1 << k) < N:
        k *= 2
    for shift, mask in generate_ops(k):
        N = (N & mask) + ((N >> shift) & mask)
    return N

The code 2 was nothing but the *java.lang.Integer.bitCount()* helper method from the Java source-code itself. The primitive `int` is 32-bit in java which is why the second code is written in a way to work only for 32-bit `int`s. And I did quite understand the python code, and I will implement the same in Java. Thanks! — Vijay Chavda, Jul 06 '16 at 12:08

JimmyB · Answer 2 · 2016-07-06T10:34:23.930

2

Because O notation refers to the size/length of the input, for an input of value N code 1 is actually O(log2(N)), because log2(N) is the length of N in bits.

The loop is executed once for each bit set in N, so for a 32-bit N, for which the worst case is N = 2^32-1 = 4294967295 it runs no more than 32 times instead of N times.

edited Jul 06 '16 at 10:34

answered Jul 06 '16 at 10:15

JimmyB

12,101
2
28
44

Sure but typically the variable that goes into the big O is the size of the input anyway, not its magnitude – harold Jul 06 '16 at 10:30
@harold That's right of course. Mind if I edit that into my answer? – JimmyB Jul 06 '16 at 10:32

Serge Rogatch · Answer 3 · 2016-07-06T10:40:12.970

0

The second algorithm runs in constant time, true, but the constant is large relatively to the constant (5) and the running time (log2(N)=32 for max N) of the 1st algorithm. Furthermore, time complexity is estimated by asymptotic analysis, i.e. how an algorithm performs for large inputs (with N approaching infinity). Asymptotic analysis is not applicable to the second algorithm because that algorithm is limited by N fitting 32 bits and doesn't produce correct values for larger inputs, while the 1st algorithm does.

When extending the second algorithm to larger inputs one can notice that its time complexity grows as some f(N). In your example f(N) = 4 (at least) as you can notice 4 identical components in 32-bit integers. Though the time complexity of the first algorithm also grows after N stops fitting the platform register size (e.g. 64 bit).

edited Jul 06 '16 at 10:40

answered Jul 06 '16 at 10:14

Serge Rogatch

13,865
7
86
158

"but the constant is very large" - No, the constant is about 20 operations for *any* N. – JimmyB Jul 06 '16 at 10:17
Still don't get it. Why is 20 "large" compared to 32? – JimmyB Jul 06 '16 at 10:35
@JimmyB, because in asymptotic analysis the constant is neglected, while 20 is quite large to be neglected compared to 3*32=96 (the number of operations in the 1st algorithm in the worst case). – Serge Rogatch Jul 06 '16 at 10:38
Actually the constant of the 1st algorithm seems 5 in each loop step, so 160 operations in total in the worst case. Edited. – Serge Rogatch Jul 06 '16 at 10:41
6 if you count looping as an operation. But that's exactly what's irrelevant in `O` notation. – JimmyB Jul 06 '16 at 10:42

score 0 · Answer 4 · answered Jul 06 '16 at 10:45

As asked, that's a meaningless question. The second piece of code is not an algorithm, it's the specialization of an algorithm for a particular size of integer. In a way the first piece of code is too, while we can in principle choose any size of int, there must be a choice, so the size of the input cannot vary. So there is not even an n to put into any big O, anything must necessarily be O(1), giving us no information because we looked at the wrong thing.

For both cases there is a general algorithm that works for any size of bit vector, which is useful to look at.

The first algorithm is obviously going to loop at most as many times as there are bits. So for a general bit vector n times, but note that that does not translate to O(n) "traditionally elementary steps", because it's calculating with n-bit bit vectors. However, if you count the broadword steps (as is typical for a problem like this), there are clearly O(n) of those.

The second algorithm is trickier in its general form, some hints are given in TAOCP volume 4A, bitwise tricks and techniques. It's not actually defined there but it becomes obvious what it should be: a tree of additions, doubling the size of which we have the popcnt at every layer by adding adjacent pairs of popcnt, where each layer is implemented in a constant number of broadword steps (the "magic masks" are µ_k so we can just pretend those constants exist). The number of layers is ceil(log2(n)) thanks to the doubling, so in total the algorithm takes O(log n) broadword steps.

Which code performs better in terms of Time-complexity?

4 Answers4

Dynamic version of algorithm 2