fastest way to count number of set bits for an interval

Question

I need to have a fast way to count the number of set bits for an index interval for a bit vector. For example, given 10000100100011000 and index interval [2, 5], the return is 2. The index starts with 0 from the right. I have a lot of queries to be done in this fashion. Does counting the number of bits separately and get the different the best way, or if there is any preprocessing that can be done to reduce complexity?

possible duplicate of [Best algorithm to count the number of set bits in a 32-bit integer?](http://stackoverflow.com/questions/109023/best-algorithm-to-count-the-number-of-set-bits-in-a-32-bit-integer) — Dave, Jan 26 '12 at 17:41
@Dave: I am well-aware of that problem. As I said here, I need to get the difference between the two numbers of set bits. I am asking if there is any pre-processing method to do lots of queries efficiently, or the difference method is the best. — Qiang Li, Jan 26 '12 at 17:45
Zero the range [0-1] and [6-31] then use the bit counting algo. — Dave, Jan 26 '12 at 18:10
@Dave: I am using a long bit vector, rather than what you assumed a 32-bit integer. — Qiang Li, Jan 26 '12 at 18:53
How wide can the index interval be? But yes you can preprocess it in O(n) to get O(1) queries - calculate the prefix-sum array. A query then maps to a subtraction of two items from the prefix-sum. — harold, Jan 27 '12 at 15:55
What architecture are you running on (x86/ARM) and what is the most probable length of the bit vector you're testing? For long vectors, you might want to read the data a byte at a time and use a lookup table to get the number of set bits in each byte. You can mask the "ends" of the start/end doesn't fall exactly on a byte boundary. — BitBank, Jan 31 '12 at 20:02

recursion.ninja · Answer 1 · 2013-08-24T15:13:55.413

Assumes a is the lower index and b is the higher index counting from right to left. Assumes input data v is normalized to a size of 64 bits (though modifiable for smaller values).

Data  10000100100011000
Index .......9876543210

C code:

  uint64_t getSetBitsInRange(uint64_t v, uint32_t a, uint32_t b) {
      // a & b are inclusive indexes
      if( a > b) { return ~0; } //check invariant: 'a' must be lower then 'b'

      uint64_t mask, submask_1, submask_2;
      submask_1   = submask_2 = 0x01;
      submask_1 <<= a;             // set the ath bit from the left  
      submask_1 >>= 1;             // make 'a' an inclusive index
      submask_1  |= submask_1 - 1; // fill all bits after ath bit
      submask_2 <<= b;             // set the bth bit from the left  
      submask_2  |= submask_2 - 1; // fill all bits after bth bit
      mask = submask_1 ^ submask_2;
      v &= mask;   // 'v' now only has set bits in specified range

      // Now utilize any population count algorithm tuned for 64bits
      // Do some research and benchmarking find the best one for you
      // I choose this one because it is easily scalable to lower sizes
      // Note: that many chipsets have "pop-count" hardware implementations
      // Software 64bit population count algorithm (parallel bit count):

      const uint64_t m[6] = { 0x5555555555555555ULL, 0x3333333333333333ULL,
                              0x0f0f0f0f0f0f0f0fULL, 0x00ff00ff00ff00ffULL,
                              0x0000ffff0000ffffULL, 0x00000000ffffffffULL};           
      v = (v & m[0]) + ((v >> 0x01) & m[0]);
      v = (v & m[1]) + ((v >> 0x02) & m[1]);
      v = (v & m[2]) + ((v >> 0x04) & m[2]);
      v = (v & m[3]) + ((v >> 0x08) & m[3]);   //comment out this line & below to make  8bit
      v = (v & m[4]) + ((v >> 0x10) & m[4]);   //comment out this line & below to make 16bit 
      v = (v & m[5]) + ((v >> 0x20) & m[5]);   //comment out this line to make 32bit
      return (uint64_t)v;  
    }

TemplateRex · Answer 2 · 2012-02-22T12:55:39.080

Here's a way to implement Dave's suggestion that works for all integers and std::bitset as well. The zeroing of the range's complement is done by shifting the vector to the right and left. You might want to pass T by const & if you are using very large bitsets. You might also want to watch out for implicit conversions when passing 8-bit and 16-bit integers.

// primary template for POD types
template<typename T>
struct num_bits
{
    enum { value = 8 * sizeof(T) };
};

// partial specialization for std::bitset
template<size_t N>
struct num_bits< std::bitset<N> >
{
    enum { value = N };
};

// count all 1-bits in n
template<typename T>
size_t bit_count(T n)
{
    return // your favorite algorithm
}

// count all 1-bits in n in the range [First, Last)
template<typename T>
size_t bit_count(T n, size_t First, size_t Last)
{
    // class template T needs overloaded operator<< and operator>>
    return bit_count((n >> First) << (num_bits<T>::value - Last));
}

// example: count 1-bits in the range [2, 5] == [2, 6)  
size_t result = bit_count(n, 2, 6);

fastest way to count number of set bits for an interval

2 Answers2