4

how can I turn off leftmost non-zero bit of a number in O(1)?

for example

n = 366 (base 10) = 101101110 (in base 2)

then after turning the leftmost non-zero bit off ,number looks like = 001101110

n will always be >0

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • @SurayansTiwari, are you sure that an `O(1)` algorithm exists? – eerorika Jan 02 '15 at 12:15
  • I am not sure that's why I asked it here –  Jan 02 '15 at 12:16
  • "after turning the leftmost non-zero bit off" so basically if the number is 00101101 you want the output to be 00001101, right? – niceman Jan 02 '15 at 12:25
  • Theoretically there should be a O( log(n) ) algorithm by using binary search to find the highest set bit. But I doubt that this will improve performance unless n is very high. – BDL Jan 02 '15 at 12:26
  • yes you are right niceman –  Jan 02 '15 at 12:28
  • dividing the number by 2 repeatedly is an O(lgN) algo – niceman Jan 02 '15 at 12:29
  • I suggest to change your example so the OPS will understand, I only noticed now – niceman Jan 02 '15 at 12:30
  • oh sorry missed that –  Jan 02 '15 at 12:30
  • i tried an O(logn) algo by repeatedly dividing the number by 2 and storing remainder in a string until the quotient is not zero, later changing the first character to 0 and then converting again to decimal gives the number – –  Jan 02 '15 at 12:33
  • if the number is read from a stream like cin or any other, then you should consider the reading process and I don't think it's O(1) algo – niceman Jan 02 '15 at 13:10

4 Answers4

4

Well, if you insist on O(1) under any circumstances, the Intel Intrinsics function _bit_scan_reverse() defined in immintrin.h does a hardware find for the most-significant non-zero bit in a int number.

Though the operation does use a loop (functional equivalent), I believe its constant time given its latency at fixed 3 (as per Intel Intrinsics Guide).

The function will return the index to the most-significant non-zero bit thus doing a simple:

n = n & ~(1 << _bit_scan_reverse(n));

should do.

This intrinsic is undefined for n == 0. So you gotta watch out there. I'm following the assumption of your original post where n > 0.

initramfs
  • 8,275
  • 2
  • 36
  • 58
  • see the comments, the bit which will be turned off is not always MSB, like this one 00111011 will turns into 00011011 – niceman Jan 02 '15 at 12:37
  • @niceman Oops, forgot to mention. The function finds the most significant non-zero bit (as opposed to just the MSB). Will edit answer. – initramfs Jan 02 '15 at 12:39
  • @SurayansTiwari Are you targeting a x86/x86_64 architecture and did you import "immintrin.h"? You need SSE enabled in your compiler to use it. – initramfs Jan 02 '15 at 12:40
  • 1
    Nice, GCC compiles this to `bsr` / `btr`. https://godbolt.org/z/8fjeoxsPE – Peter Cordes May 06 '21 at 13:15
3

n = 2^x + y. x = log(n) base 2

Your highest set bit is x.

So in order to reset that bit, number &= ~(1 << x);

Another approach:

int highestOneBit(int i) {
    i |= (i >>  1);
    i |= (i >>  2);
    i |= (i >>  4);
    i |= (i >>  8);
    i |= (i >> 16);
    return i - (i >> 1);
}

int main() {
    int n = 32767;
    int z = highestOneBit(n); // returns the highest set bit number i.e 2^x.
    cout<< (n&(~z)); // Resets the highest set bit.
    return 0;
}
thepace
  • 2,221
  • 1
  • 13
  • 21
0

Check out this question, for a possibly faster solution, using a processor instruction.

However, an O(lgN) solution is:

int cmsb(int x)
{
    unsigned int count = 0;
    while (x >>= 1) {
        ++count;
    }

    return x & ~(1 << count);
}
Community
  • 1
  • 1
Călin
  • 347
  • 2
  • 13
  • good answer, O(lgN) is better than O(N) but it's still not an answer since he wants it to be O(1) – niceman Jan 02 '15 at 12:27
  • 1
    This is O(num_bits), the number of bits in a register, which is the N in most bit-manipulation operations where we're not considering the bit-pattern as representing a number. And since you used signed `int`, it will infinite-loop if the MSB is set (i.e. if x is negative) – Peter Cordes May 06 '21 at 13:00
-1

If ANDN is not supported and LZCNT is supported, the fastest O(1) way to do it is not something along the lines of n = n & ~(1 << _bit_scan_reverse(n)); but rather...

int reset_highest_set_bit(int x)
{
    const int mask = 0x7FFFFFFF; // 011111111[...]

    return x & (mask >> __builtin_clz(x));
}
  • 1
    This is broken, e.g. `x = 0x7fffffff` makes it return `1`. Perhaps you wanted `mask >> __builtin_clz(x)` to shift by the leading-zero count? BSR gives you 31-clz, i.e. the position of the highest set bit. Also, these should be unsigned, especially `1U << 31` to avoid signed overflow in your constant expression. https://godbolt.org/z/q66Evj74K. https://godbolt.org/z/8fjeoxsPE has a working version. – Peter Cordes May 06 '21 at 13:10
  • Thanks - I though "BSF = TZCNT <=> BSR = LZCNT" (except for zero input) but the latter ones are actually totally different instructions. I don't see where `(int)1 << 31` does not result in `0x80000000` and more importantly where `((int)1 << 31) - 1` does not produce `0x7FFFFFFF`. – MrUnbelievable92 May 11 '21 at 15:08
  • 32-bit `int` (like x86 C implementations use) can't represent `+0x80000000` - that 32-bit bit-pattern represents a negative number as signed 2's complement. Thus, that left-shift of `1` (of type signed `int`) has signed overflow to INT_MIN, which is undefined behaviour in C. And then the `-1` also has undefined behaviour, wrapping back to INT_MAX. It does happens to work in this case on mainstream compilers, but there's no reason to rely on undefined behaviour when you could just write `1U` to work with unsigned numbers. That's almost always what you want for bit-manipulation. – Peter Cordes May 11 '21 at 15:51
  • It would be a lot better to simply use `INT_MAX` or `UINT_MAX>>1` - that would be portable to all `int` or `unsigned` widths supported by GNU C. (This code is now portable, now that you've replaced an x86 intrinsic with a GNU C builtin.) – Peter Cordes May 11 '21 at 15:54
  • I guess it makes sense :) Although I tested it to make sure. In my version i actually write 0x7FFFFFFF but I wanted to make a little more readable or rather inferable, although that's definitely debatable ("why "-1"?"). Thanks again, though! – MrUnbelievable92 May 12 '21 at 07:30
  • `INT_MAX` has the advantage of working for any width of `int`, including stuff like AVR or MSP430 microcontrollers where GCC uses 16-bit `int`. But sure, writing out the bit-pattern does illustrate how it works more directly. – Peter Cordes May 12 '21 at 12:43