4

I am unsure what something like this would be called, (hence the clumsy title) but I need something like this for something I'm working on. I can't describe it well in words, but I hope this drawing explains for me:

Binary question picture thing

What would be the fastest way of getting the number of "on-bits", "3" in this example, when everything after the arbitrary "index" (5 for example) is to be ignored?

Community
  • 1
  • 1
Anne Quinn
  • 12,609
  • 8
  • 54
  • 101
  • 3
    http://graphics.stanford.edu/~seander/bithacks.html, simple masking will do for the "ignored" part. – Mat Sep 22 '11 at 11:03
  • @Mat - ... oh, duh! I'm not sure why I thought that had to take place /after/ finding the bits. I guess that just leaves finding the on bits then. – Anne Quinn Sep 22 '11 at 11:05
  • I wonder why this operation is a bottleneck in your program... – lhf Sep 22 '11 at 11:21
  • @lhf - Never said it was! I figure if I'm gonna ask how to do something, may as well ask for the best way. :P – Anne Quinn Sep 22 '11 at 11:30
  • First make it right. Then make it fast. But measure first. – lhf Sep 22 '11 at 11:37
  • If you want to specify the index relative to the most significant 1-bit you can use __builtin_clz or _BitScanReverse to get the index of the most significant 1-bit – harold Sep 24 '11 at 12:05
  • For variable cutoff index, see [What is the efficient way to count set bits at a position or lower?](https://stackoverflow.com/a/34410357). (Which also had a special meaning for a bit, favouring shifts even more.) – Peter Cordes Jul 14 '22 at 10:23

4 Answers4

6

In addition to what has already been said, I would like to bring it to your attention that many compilers offer a build-in popcnt that may be faster than doing it manually (then again, maybe not, be sure to test it). They have the advantage of probably compiling to a single popcnt opcode if it's available in your target architecture (but I heard they do stupid slow things when falling back to a library function), whereas you'd be very lucky if the compiler detected one of the algorithms from Sean's bithacks collection (but it may).

For msvc, it's __popcnt (and variants), for gcc it's __builtin_popcount (and variants), for OpenCL (ok you didn't ask for that, but why not throw it in) it's popcnt but you must enable cl_amd_popcnt.

harold
  • 61,398
  • 6
  • 86
  • 164
  • I can always throw it in a `ifdef` block and try it out against what I wrote once I get a large enough test to properly profile it. Thanks! – Anne Quinn Sep 22 '11 at 11:23
5

First do input &= 0xfffffff8 (or the equivalent for your input type) to clear the three rightmost bits. Then take your pick from here.

Jon
  • 428,835
  • 81
  • 738
  • 806
  • I'm dumb for not masking the bits before trying to find the bits that are set. This makes things much easier. And thanks for the article. Probably gonna end up with the table lookup. – Anne Quinn Sep 22 '11 at 11:10
3

Something like the following:

int
countOnes( unsigned value, int top )
{
    assert( top >= 0 && opt < CHAR_BIT * sizeof( unsigned ) );
    value &= ~(~0 << top);
    int results = 0;
    while ( value != 0 ) {
        ++ results;
        value &= value - 1;
    }
    return results;
}

This depends on the somewhat obscure fact that i & (i - 1) resets the low order bit.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • That's a neat property, thinking about it. And it certainly works! If it's okay, I'm gonna use something like this to build a lookup table, and go from that. – Anne Quinn Sep 22 '11 at 11:13
  • 1
    @Clairvoire **If** you never need to count on more than a byte, a lookup table is clearly the way to go. (For that matter, if you need to count on more, treating the value as an array of bytes, and summing the values obtained from a lookup table is still a very good solution.) – James Kanze Sep 22 '11 at 11:15
  • ... actually, I might need a longer data type. I'll just do it both with and without the lookup table, and profile which is better later on. – Anne Quinn Sep 22 '11 at 11:32
2

A lookup table will give you this information in constant time.

lhf
  • 70,581
  • 9
  • 108
  • 149
  • 1
    ... didn't think of that at all. And it's only 256 permutations. Thanks! – Anne Quinn Sep 22 '11 at 11:07
  • A lookup table is fine for bytes, but for types like `int`, it becomes a bit unwieldy. (And you still need the mask to trim the unwanted bits at the top.) – James Kanze Sep 22 '11 at 11:12
  • 1
    Note that lookup tables can be much less efficient (cache misses) and do not lend themselves to vectorization via SIMD. Fine if performance is not an issue though of course. – Paul R Sep 22 '11 at 11:14
  • @James - Would it be possible to perform the mask first. Then combine the results of 4 or 8 lookups using addition for an integer? Then again, after that trouble, might end up being less effective than doing the calculation itself instead – Anne Quinn Sep 22 '11 at 11:18
  • 1
    @PaulR you can actually replace the lookup table with SIMD, pshufb is awesome like that. It takes 2 pshufb's, because it's only a 16byte LUT. http://wm.ite.pl/articles/sse-popcount.html – harold Sep 22 '11 at 11:59
  • 1
    @harold: yes - that's actually a combination of two useful techniques: (i) breaking the problem down into two 4 bit population counts and (ii) using a permute instruction to perform a lookup for each nybble. – Paul R Sep 22 '11 at 12:09
  • @Claivoire Regardless of the algorithm used, you should perform the mask first. Afterwords, summing the results of n lookups is definitely a valid alternative to other algorithms. – James Kanze Sep 22 '11 at 12:27