7

Maybe you can help me with the following problem that can help me speed a memory manager I am thinking of (I am not sure a solution exists – I did not find one).

I have a 32 bits register and I need to find if there are n consecutive set bits in it, and if so what is their offset. For example if the register holds the following value 111100000000000000000001111111000 and n equals to 4 – any of the following answer is accepted (offsets starts from 0):

3, 4, 5, 6, 28

The atomic operations I have are all the regular bitwise operations (&, |, ~, …) and also finding the least significant bit offset (3 in the register above). The algorithm (assuming one exists) – should take no more than 5 atomic operations.

Nikolay Fominyh
  • 8,946
  • 8
  • 66
  • 102
user1386966
  • 3,302
  • 13
  • 43
  • 72
  • 1
    Make a mask that has the last `n` bits set, then shift it and match it. Easy. – Qnan Aug 21 '12 at 11:13
  • @daa - I didnt know how to accept answers until recently. I tried only loops , but have no idea how to do it without it. – user1386966 Aug 21 '12 at 11:16
  • @Qnan - it is much more than 5 operation.. it's better to loop – user1386966 Aug 21 '12 at 11:16
  • I know a better way, that only looks at runs of 1's, but it's still more than 5 operations and it behaves terribly when the bits alternate between 0 and 1. – harold Aug 21 '12 at 11:18
  • It's not homework , I can do a simple loop and find the needed offset. But I truly have no idea if it can be done in O(1) – user1386966 Aug 21 '12 at 11:34
  • @user1386966 as a matter of fact it is O(1) if you say there're only 32 bits. Doing it in 5 operations is a challenge, though. – Qnan Aug 21 '12 at 11:58
  • @Qnan You can cache it in 32bit register and evaluate. Then when you want to modify the bit you can do [compare and swap](https://en.wikipedia.org/wiki/Compare-and-swap). In this case you need only one atomic operation. – KRoy Jun 19 '16 at 04:14
  • Related: [Finding consecutive bit string of 1 or 0](https://stackoverflow.com/q/3304705) for counting set bits without finding where they are. – Peter Cordes Apr 16 '22 at 02:38

5 Answers5

5

If there is an algorithm that does that, then the worst case complexity is at least O(m-n), where m is a the number of bits in the register and n is the number of consecutive set bits you are looking for. This is easy to see because if all bits are set, your algorithm will have to output exactly m-n items, so it's complexity cannot be any lower.

EDIT

There's an elegant solution to a similar problem here Looping through bits in an integer, ruby, finding the length of the longes 1 sequence.

If you know the length n of the run you're looking for in advance, this algorithm will require only n steps. The offset can then be recovered from the number of trailing zeroes in the pre-last step of the algorithm in about 5 more steps. That's not extremely efficient, but probably better than the loop-through solution, especially for a small n.

EDIT 2

If the n is known in advance, we can figure out a sequence of necesary shifts for it. E.g. if we are looking for 7 bit runs, then we'd have to do

x &= x >> 1
x &= x >> 3
x &= x >> 1
x &= x >> 1

The point is that we shift right n/2 bits if n is even or by 1 if n is odd, then update n accordingly (either n = n - 1 or n = n / 2), as @harold suggests. Estimating these values on the fly would be expensive, but if we pre-calculate them then it's going to be pretty efficient.

EDIT 3

Even better, for any n, exactly ceil(log(2,n)) steps would be required, no matter which shift we take, as long as it is between floor(n/2) and 2^floor(log(2,n-1)). See comments below.

Community
  • 1
  • 1
Qnan
  • 3,714
  • 18
  • 15
  • It only has to output one item though - the question says *any* of those offsets are ok, not that it has to find all of them. – harold Aug 21 '12 at 11:24
  • but it's enough to find one offset - I don't need all of them - the minute it finds one - it will print it out and that's it.. – user1386966 Aug 21 '12 at 11:26
  • the sample for n==7 is broken, try it on `63` for instance. The lengths that can be optimized are those that are not prime numbers. – salva Aug 21 '12 at 12:44
  • @salva Check the update. I think it's about odd vs even numbers, rather than prime or composite ones. Think about 9. It's going to be 1,4,2,1, I think. Or 11, which corresponds to 1,5,1,2,1. – Qnan Aug 21 '12 at 12:51
  • @Qnan, I think the optimum relation is both more complex than looking for prime numbers or for odds and evens. I think it goes more with finding the biggest power of two `p = (1<> (n - p))`. For instance in the n=7 case, the optimum is `x &= x >> 1; x &= x >> 2; x &= x >> 3`. For n=9, it is `x &= x >> 1; x &= x >> 2; x &= x >> 4; x &= x >> 1` – salva Aug 21 '12 at 13:10
  • Isn't the optimum always shifting by `max(1, n>>1)`? (continue with `n -= n >> 1`) Could be wrong here, I'll investigate this.. – harold Aug 21 '12 at 14:19
  • @salva looks like you're right, I think I can even prove it formally. Basically by every shift you reduce the problem to the one for a smaller `n` as long as you shift by no more than `floor(n/2)`. And then we can incrementally find the shortest sequences for numbers from 1 to 32. For each new one we have to check every valid shift up to `floor(n/2)` and find the optimum sequence. – Qnan Aug 21 '12 at 14:22
  • That gives us 2:1; 3:1,1; 4:2,1; 5:1,2,1; 6:2,2,1; 7:3,2,1; 8:4,2,1; 9:1,4,2,1; 10:2,4,2,1; 11:3,4,2,1; 12:4,4,2,1; 13:5,4,2,1; 14:6,4,2,1; 15:7,4,2,1; 16:8,4,2,1; etc. Actually, it appears that for any `n`, exactly `ceil(log(2,n))` steps would be required, no matter which shift we take, as long as it is between `floor(n/2)` and `2^floor(log(2,n-1))`, so you both are correct. – Qnan Aug 21 '12 at 14:31
  • I was looking for this, so I figured I'd share my C++ implementation ( https://stackoverflow.com/a/76127169/2963099 ) It does seems to tie out exactly as mentioned above – Glenn Teitelbaum Apr 28 '23 at 07:26
2

for every possible byte value (0-255) calculate the number of bits at the beginning, the number of bits at the end and the longest number of consecutive bits inside the byte and the offset of this sequence. For instance, for 0b11011101, there are 2 bits at the beginning, 1 bit at the end and a sequence of 3 consecutive bits in it.

Store this values in 4 arrays, for instance start, end, longest, longest_offset.

Then, consider the 32bit number as a 4 bytes array and iterate over these bytes as follows:

int search_bit_sequence(uint32 num, int desired) {
  unsigned char *bytes = (unsigned char *)&num;
  int i, acu;
  for (acu = i = 0; i < 4; i++) {
    int byte = bytes[i];
    acu += start[byte];
    if (acu >= desired)
      return (i * 8 - (acu - start[byte]));

    if (longest[byte] >= desired)
      return ( i * 8 + longest_offset[byte]);

    if (longest[byte] < 8)
      acu = end[byte];
  }
  return -1; /* not found */
}

update: notice that the endianess of your CPU may require changing the loop direction.

salva
  • 9,943
  • 4
  • 29
  • 57
  • 2
    this would be ok for searching a longer bit array, but i think it's a real overshoot if you're only concerned with one register – Qnan Aug 21 '12 at 12:21
  • 1
    Did you maybe mean `0b11011101`? Hex `0x11011101` doesn't have any consecutive set bits, just some groups of 0001 nibbles (`0x1`) and some groups of 0000 (`0x0`). – Peter Cordes Apr 16 '22 at 02:40
  • @PeterCordes, yes, you are right! – salva Apr 16 '22 at 06:51
1

The link posted by Qnan shows an elegant solution to the general case.

For particular values of m it could be further optimized.

For instance, for m == 4, you can just do:

x &= (x >> 1);
x &= (x >> 2);
// at this point, the first bit set in x indicates a 4 bit set sequence.

For m == 6 :

x &= (x >> 1);
x &= (x >> 1);
x &= (x >> 3);

In the end, that just reduces to factoring m.

update

Note also, that for high values of, it may actually be cheaper to just check for the bit sequence at every possible position.

For instance, for m = 23, the pattern can only start at positions from 0 to 9.

salva
  • 9,943
  • 4
  • 29
  • 57
1

I checked this question and answers and came up with following idea.

int i = n-1;
uint32_t y = x;
while(y && i--) {
    y = y & (y << 1);
};

After above operation y is nonzero if there is n consecutive set bits. The next thing to do is to find the least significant value set there. The following code will strip all the set bits except the least significant.

z = y - (y & (y-1));

Now that we have only one bit set, we need to find the position of the bit. We can use switch statement with 32 cases.

static inline int get_set_position(const uint32_t z) {
    switch(z) {
        case 0x1:
            return 0;
        case 0x2:
            return 1;
        ....
        .... // upto (1<<31) total 32 times.
    }
    return -1;
}

Finally to get the result we need to reduce n-1. So the total procedure is the following.

static inline int get_set_n_position(const uint32_t x, const uint8_t n) {
    if(!n) return -1;
    int i = n-1;
    uint32_t y = x;
    while(y && i--) {
        y = y & (y << 1);
    };
    if(!y) return -1;
    uint32_t z = y - (y & (y-1));
    if(!z) return -1;
    int pos = get_set_position(z);
    if(pos < 0) return -1;
    assert(pos >= (n-1));
    return pos - (n-1);
}

Now there is concern for big-endian. I think I just need to change the get_set_position() for big-endian to make it work(assuming the consecutive set bits definition changes based on endian-ness).

Let me share a tested code that uses builtin_ctzl provided by gcc.

OPP_INLINE int get_set_n_position(BITSTRING_TYPE x, const uint8_t n) {
    if(!n || n > BIT_PER_STRING) return -1;
    int i = n-1;
    while(x && i--) {
        x = x & (x << 1);
    };
    if(!x) return -1;
    int pos = __builtin_ctzl(x);
    return pos - (n-1);
}

The code works in O(1) time because 32 is constant(as @Qnan noticed). Again it works in O(n) if the size of register vary.

Note: I fixed the bugs, thanks to comments and unit-testing.

Community
  • 1
  • 1
KRoy
  • 1,290
  • 14
  • 10
  • 1
    Your initial assertion is false. If `x = 11011011` and `n=3` Then `x << (n-1)` is `01101100`. `11011011 & 01101100 = 01001000`. The result is non-zero and yet there are not 3 consecutive set bits. – Jim Mischel Jun 19 '16 at 02:55
  • @JimMischel Thanks, I fixed the bug and updated the answer now. – KRoy Jun 19 '16 at 03:15
  • 1
    That doesn't work, either. `x = 11011011`. `x << n = 11011000` The result of a bitwise AND is `11011000`. – Jim Mischel Jun 19 '16 at 03:18
  • @JimMischel thanks, It was a mistake. Now I fixed and tested the code. – KRoy Jun 19 '16 at 04:07
1

Apologies for raising this from the dead, but I actually needed the generic algo where you know the number of bits, M, ahead of time.

The comments in a very good answer: (https://stackoverflow.com/a/12053749/2963099) led me to realize that a C++ recursive solution would be sufficient to calculate a O(log(M)) where M is the number of consecutive bits being searched as follows:

#include <bit>
#include <cstdint>

template <int BITS>
struct consecutive_bits {
   static int ones(uint64_t b) {
     b &= b >> BITS/2;
     return consecutive_bits<BITS-BITS/2>::ones(b);
   }
   static int zeros(uint64_t b) {
      return ones(~b);
   }
};

template <>
struct consecutive_bits<1> {
   static int ones(uint64_t b) {
      return std::countr_zero(b);
   }
   static int zeros(uint64_t b) {
      return ones(~b);
   }
};

(You can see it on Compiler Explorer: https://godbolt.org/z/9PWbYWYoW )

The asm is quite simple, using 7 as an example (and setting -march=haswell):

mov     rax, rdi
shr     rax, 3
and     rax, rdi

mov     rdx, rax
shr     rdx, 2
and     rdx, rax

mov     rax, rdx
shr     rax
and     rax, rdx

tzcnt   rax, rax
ret

From the line breaks I added, it seems clear that it is O(M) with K being 3 asm lines: MOV, SHR, AND

Glenn Teitelbaum
  • 10,108
  • 3
  • 36
  • 80