When looking at such a question, you need to break it down in simple pieces.
For example, suppose that you know how many 1s there are in all numbers [0, N]
(let's call this ones(N)
), then we have:
size_t ones(size_t N) { /* magic ! */ }
size_t count(size_t A, size_t B) {
return ones(B) - (A ? ones(A - 1) : 0);
}
This approach has the advantage that one
is probably simpler to program that count
, for example using recursion. As such, a first naive attempt would be:
// Naive
size_t naive_ones(size_t N) {
if (N == 0) { return 0; }
return __builtin_popcount(N) + naive_ones(N-1);
}
But this is likely to be too slow. Even when simply computing the value of count(B, A)
we will be computing naive_ones(A-1)
twice!
Fortunately, there is always memoization to assist here, and the transformation is quite trivial:
size_t memo_ones(size_t N) {
static std::deque<size_t> Memo(1, 0);
for (size_t i = Memo.size(); i <= N; ++i) {
Memo.push_back(Memo[i-1] + __builtin_popcnt(i));
}
return Memo[N];
}
It's likely that this helps, however the cost in terms of memory might be... crippling. Ugh. Imagine that for computing ones(1,000,000)
we will occupy 8MB of memory on a 64bits computer! A sparser memoization could help (for example, only memoizing every 8th or 16th count):
// count number of ones in (A, B]
static unoptimized_count(size_t A, size_t B) {
size_t result = 0;
for (size_t i = A + 1; i <= B; ++i) {
result += __builtin_popcount(i);
}
return result;
}
// something like this... be wary it's not tested.
size_t memo16_ones(size_t N) {
static std::vector<size_t> Memo(1, 0);
size_t const n16 = N - (N % 16);
for (size_t i = Memo.size(); i*16 <= n16; ++i) {
Memo.push_back(Memo[i-1] + unoptimized_count(16*(i-1), 16*i);
}
return Memo[n16/16] + unoptimized_count(n16, N);
}
However, while it does reduce the memory cost, it does not solve the main speed issue: we must at least use __builtin_popcount
B times! And for large values of B this is a killer.
The above solutions are mechanical, they did not require one ounce of thought. It turns out that interviews are not so much about writing code than they are about thinking.
Can we solve this problem more efficiently than dumbly enumerating all integers until B
?
Let's see what our brains (quite the amazing pattern machine) picks up when considering the first few entries:
N bin 1s ones(N)
0 0000 0 0
1 0001 1 1
2 0010 1 2
3 0011 2 4
4 0100 1 5
5 0101 2 7
6 0110 2 9
7 0111 3 12
8 1000 1 13
9 1001 2 15
10 1010 2 17
11 1011 3 20
12 1100 2 22
13 1101 3 25
14 1110 3 28
15 1111 3 32
Notice a pattern ? I do ;) The range 8-15 is built exactly like 0-7 but with one more 1 per line => it's like a transposition. And it's quite logical too, isn't it ?
Therefore, ones(15) - ones(7) = 8 + ones(7)
, ones(7) - ones(3) = 4 + ones(3)
and ones(1) - ones(0) = 1 + ones(0)
.
Well, let's make this a formula:
- Reminder:
ones(N) = popcount(N) + ones(N-1)
(almost) by definition
- We now know that
ones(2**n - 1) - ones(2**(n-1) - 1) = 2**(n-1) + ones(2**(n-1) - 1)
Let's make isolate ones(2**n)
, it's easier to deal with, note that popcount(2**n) = 1
:
- regroup:
ones(2**n - 1) = 2**(n-1) + 2*ones(2**(n-1) - 1)
- use the definition:
ones(2**n) - 1 = 2**(n-1) + 2*ones(2**(n-1)) - 2
- simplify:
ones(2**n) = 2**(n-1) - 1 + 2*ones(2**(n-1))
, with ones(1) = 1
.
Quick sanity check:
1 = 2**0 => 1 (bottom)
2 = 2**1 => 2 = 2**0 - 1 + 2 * ones(1)
4 = 2**2 => 5 = 2**1 - 1 + 2 * ones(2)
8 = 2**3 => 13 = 2**2 - 1 + 2 * ones(4)
16 = 2**4 => 33 = 2**3 - 1 + 2 * ones(8)
Looks like it works!
We are not quite done though. A
and B
might not necessarily be powers of 2, and if we have to count all the way from 2**n
to 2**n + 2**(n-1)
that's still O(N)!
On the other hand, if we manage to express a number in base 2, then we should be able to leverage our newly acquired formula. The main advantage being than there are only log2(N) bits in the representation.
Let's pick an example and understand how it works: 13 = 8 + 4 + 1
1 -> 0001
4 -> 0100
8 -> 1000
13 -> 1101
... however, the count is not just merely the sum:
ones(13) != ones(8) + ones(4) + ones(1)
Let's express it in terms of the "transposition" strategy instead:
ones(13) - ones(8) = ones(5) + (13 - 8)
ones(5) - ones(4) = ones(1) + (5 - 4)
Okay, easy to do with a bit of recursion.
#include <cmath>
#include <iostream>
static double const Log2 = log(2);
// store ones(2**n) at P2Count[n]
static size_t P2Count[64] = {};
// Unfortunately, the conversion to double might lose some precision
// static size_t log2(size_t n) { return log(double(n - 1))/Log2 + 1; }
// __builtin_clz* returns the number of leading 0s
static size_t log2(size_t n) {
if (n == 0) { return 0; }
return sizeof(n) - __builtin_clzl(n) - 1;
}
static size_t ones(size_t n) {
if (n == 0) { return 0; }
if (n == 1) { return 1; }
size_t const lg2 = log2(n);
size_t const np2 = 1ul << lg2; // "next" power of 2
if (np2 == n) { return P2Count[lg2]; }
size_t const pp2 = np2 / 2; // "previous" power of 2
return ones(pp2) + ones(n - pp2) + (n - pp2);
} // ones
// reminder: ones(2**n) = 2**(n-1) - 1 + 2*ones(2**(n-1))
void initP2Count() {
P2Count[0] = 1;
for (size_t i = 1; i != 64; ++i) {
P2Count[i] = (1ul << (i-1)) - 1 + 2 * P2Count[i-1];
}
} // initP2Count
size_t count(size_t const A, size_t const B) {
if (A == 0) { return ones(B); }
return ones(B) - ones(A - 1);
} // count
And a demonstration:
int main() {
// Init table
initP2Count();
std::cout << "0: " << P2Count[0] << ", 1: " << P2Count[1] << ", 2: " << P2Count[2] << ", 3: " << P2Count[3] << "\n";
for (size_t i = 0; i != 16; ++i) {
std::cout << i << ": " << ones(i) << "\n";
}
std::cout << "count(7, 14): " << count(7, 14) << "\n";
}
Victory!
Note: as Daniel Fisher noted, this fails to account for negative number (but assuming two-complement it can be inferred from their positive count).