What is fastest method to calculate a number having only bit set which is the most significant digit set in another number?

Question

Possible Duplicates:
Previous power of 2
Getting the Leftmost Bit

What I want is, suppose there is a number 5 i.e. 101. My answer should be 100. For 9 i.e. 1001, the answer should be 1000

I changed your binary representations to match the decimal ones (both were wrong), and changed one of the decimal ones; hopefully it's now what you intended to say, I wasn't positive — Michael Mrozek, Aug 04 '10 at 15:32
See previous questions which are virtually identical: http://stackoverflow.com/questions/2679815/previous-power-of-2 and http://stackoverflow.com/questions/2893525/getting-the-leftmost-bit — Paul R, Aug 04 '10 at 15:35
Wow, that is some impressive cleanup. Assuming you got it right, you managed to turn this question from incomprehensible to well asked. Of course, it's still a dupe. — nmichaels, Aug 04 '10 at 15:52
Against closing: This is IMHO a question about **performance** and **machine dependend optimization**, not how to do it. — Nordic Mainframe, Aug 04 '10 at 16:08
why are all of these questions closed? there is an answer that only applies to c++ here! — Jake Schmidt, Apr 06 '21 at 19:55

Nordic Mainframe · Answer 1 · 2010-08-04T19:31:04.743

7

You can't ask for the fastest sequence without giving constrains on the machine on which this has to run. For example, some machines support an instruction called "count leading zeroes" or have means to emulate it very quickly. If you can access this instruction (for example with gcc) then you can write:

#include <limits.h>
#include <stdint.h>
uint32_t f(uint32_t x) 
{
    return ((uint64_t)1)<<(32-__builtin_clz(x)-1);
}
int main()
{
    printf("=>%d\n",f(5));
    printf("=>%d\n",f(9));
}

f(x) returns what you want (the least y with x>=y and y=2**n). The compiler will now generate the optimal code sequence for the target machine. For example, when compiling for a default x86_64 architecture, f() looks like this:

    bsrl    %edi, %edi
    movl    $31, %ecx
    movl    $1, %eax
    xorl    $31, %edi
    subl    %edi, %ecx
    salq    %cl, %rax
    ret

You see, no loops here! 7 instructions, no branches.

But if I tell my compiler (gcc-4.5) to optimize for the machine I'm using right now (AMD Phenom-II), then this comes out for f():

    bsrl    %edi, %ecx
    movl    $1, %eax
    salq    %cl, %rax
    ret

This is probably the fastest way to go for this machine.

EDIT: f(0) would have resulted in UB, I've fixed that (and the assembly). Also, uint32_t means that I can write 32 without feeling guilty :-)

edited Aug 04 '10 at 19:31

answered Aug 04 '10 at 15:42

Nordic Mainframe

28,058
10
66
83

You're relying on a very platform specific assembly instruction, but bothering to use compatibility for machines with 7 or 9 bit chars? (`CHAR_BIT`)? :P (But +1 though) – Billy ONeal Aug 04 '10 at 15:47
IIRC, then all recent versions of gcc have __builtin_clz(x), so basically this code should be fine for all machines with gcc support. – Nordic Mainframe Aug 04 '10 at 15:52
Well, CHAR_BIT is a nice hint why this 8 is an 8. Just dropping the 8 in there would have made the code less clear. :oP indeed. – nmichaels Aug 04 '10 at 15:54
Rather than `(uint64_t)1`, you could write `1ULL`. – caf Aug 05 '10 at 00:16
"f(0) would have resulted in UB, I've fixed that" The code still contains UB, because __builtin_ctz is undefined for 0. – Rufflewind May 04 '16 at 01:56

score 6 · Answer 2 · answered Aug 04 '10 at 15:32

6

From Hacker's Delight, a nice branchless solution:

uint32_t flp2 (uint32_t x)
{
    x = x | (x >> 1);
    x = x | (x >> 2);
    x = x | (x >> 4);
    x = x | (x >> 8);
    x = x | (x >> 16);
    return x - (x >> 1);
}

This typically takes 12 instructions. You can do it in fewer if your CPU has a "count leading zeroes" instruction.

answered Aug 04 '10 at 15:32

Paul R

208,748
37
389
560

It is better if you just vote to close as exact duplicate... instead of adding yet the same answer again. – David Rodríguez - dribeas Aug 04 '10 at 15:54
@David: I would have done if I'd had any close votes left for today. As it is I posted links to two duplicates in the comments above, assuming that others would vote to close. – Paul R Aug 04 '10 at 16:02

score 4 · Answer 3 · answered Aug 04 '10 at 15:27

4

int input = 5;
std::size_t numBits = 0;
while(input)
{
    input >>= 1;
    numBits++;
}
int output = 1 << (numBits-1);

answered Aug 04 '10 at 15:27

Billy ONeal

104,103
58
317
552

score 1 · Answer 4 · answered Aug 04 '10 at 16:09

This is a task related to the bit counting. Check this out.

Using the 2a (which is my favorite of the algorithms; not the fastest) one can come up with this:

int highest_bit_mask (unsigned int n)  {
   while (n)  {
      if (n & (n-1)) {
         n &= (n-1) ;
      } else {
         return n;
      }
   }
   return 0;
}

The magic of n &= (n-1); is that it removes from n the least significant bit. (Corollary: n & (n-1) is false only when n has precisely one bit set.) The algorithm complexity depends on number of bits set in the input.

Check out the link anyway. It is a very amusing and enlightening read which might give you more ideas.

What is fastest method to calculate a number having only bit set which is the most significant digit set in another number?

4 Answers4

Linked