Ok, seems like I have a reasonable answer. First let's define binom(n,k)
as the number of ways in which we can set k
out of n
bits. That's the classic Pascal triangle:
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
...
Easily calculated and cached. Note that the sum of each line is 1<<lineNumber
.
The next thing we'll need is the partial_sum
of that triangle:
1 2
1 3 4
1 4 7 8
1 5 11 15 16
1 6 16 26 31 32
1 7 22 42 57 63 64
1 8 29 64 99 120 127 128
1 9 37 93 163 219 247 255 256
...
Again, this table can be created by summing two values from the previous line, except that the new entry on each line is now 1<<line
instead of 1
.
Let's use these tables above to construct f(x)
for an 8 bits number (it trivially generalizes to any number of bits). f(0)
still has to be 0. Looking up the 8th row in the first triangle, we see that next 8 entries are f(1)
to f(9)
, all with one bit set. The next 28 entries (7+6+5+4+3+2+1) all have 2 bits set, so that's f(10) to f(37). The next 56 entries, f(38) to f(93) have 3 bits, and there are 70 entries with 4 bits set. From symmetry we can see that they're centered around f(128), in particular they're f(94) to f(163). And obviously, the only number with 8 bits set sorts last, as f(255).
So, with these tables we can quickly determine how many bits must be set in f(i). Just do a binary search in the last row of your table. But that doesn't answer exactly which bits are set. For that we need the previous rows.
The reason that each value in the table can be created from the previous line is simple. binom(n,k) == binom(k, n-1) + binom(k-1, n-1). There are two sorts of numbers with k bits set: Those that start with a 0...
and numbers which start with 1...
. In the first case, the next n-1
bits must contain those k
bits, in the second case the next n-1
bits must contain only k-1
bits. Special cases are of course 0 out of n
and n out of n
.
This same stucture can be used to quickly tell us what f(16)
must be. We already had established that it must contain 2 bits set, as it falls in the range f(10) - f(37)
. In particular, it's number 6 with 2 bits set (starting as usual with 0). It's useful to define this as an offset in a range as we'll try to shrink the length this range from 28 down to 1.
We now subdivide that range into 21 values which start with a zero and 7 which start a one. Since 6 < 21, we know that the first digit is a zero. Of the remaining 7 bits, still 2 need to be set, so we move up a line in the triangle and see that 15 values start with two zeroes, and 6 start with 01. Since 6 < 15, f(16) starts with 00. Going further up, 7 <= 10 so it starts with 000
. But 6 == 6, so it doesn't start with 0000
but 0001
. At this point we change the start of the range, so the new offset becomes 0 (6-6)
We know need can focus only on the numbers that start with 0001
and have one extra bit, which are f(16)...f(19)
. It should be obvious by know that the range is f(16)=00010001, f(17)=00010010, f(18)=00010100, f(19)=00011000
.
So, to calculate each bit, we move one row up in the triangle, compare our "remainder", add a zero or one based on the comparison possibly go left one column. That means the computational complexity of f(x)
is O(bits)
, or O(log N)
, and the storage needed is O(bits*bits)
.