I'll just explain how that code works, so it's easy to extend it.
The LUT for 2-bit numbers can be calculated easily:
0, 1, 1, 2
That is:
- binary '00' has 0 set bits
- binary '01' has 1 set bits
- binary '10' has 1 set bits
- binary '11' has 2 set bits
Now try to build a LUT for 4-bit numbers. There are 16 numbers, which can be enumerated as follows:
- binary '00xx', where xx is any 2-bit number
- binary '01xx', where xx is any 2-bit number
- binary '10xx', where xx is any 2-bit number
- binary '11xx', where xx is any 2-bit number
This enumeration lets you count the set bits easily:
- binary '00xx' have 0+B2(xx) set bits
- binary '01xx' have 1+B2(xx) set bits
- binary '10xx' have 1+B2(xx) set bits
- binary '11xx' have 2+B2(xx) set bits
So your LUT for 4-bit numbers will look like this:
0, 1, 1, 2,
1, 2, 2, 3,
1, 2, 2, 3,
2, 3, 3, 4
In a general case, if you have your LUT for N bits:
0, 1, 1, 2, 1, 2, ...
You can convert it to a LUT for N+2 bits:
0, 1, 1, 2, 1, 2, ...
1, 2, 2, 3, 2, 3, ... // all numbers as above plus 1
1, 2, 2, 3, 2, 3, ... // another row of numbers, the same
2, 3, 3, 4, 3, 4, ... // all numbers as above plus 1
The adding of 1 to previous numbers is achieved by the macros. To continue your table to 16, just add more lines:
# define B6(n) B4(n), B4(n+1), B4(n+1), B4(n+2)
# define B8(n) B6(n), B6(n+1), B6(n+1), B6(n+2)
# define BA(n) B8(n), B8(n+1), B8(n+1), B8(n+2)
...