2

My question is quite similar to Fastest way of computing the power that a "power of 2" number used?:

Giving x=2^y as an input I want to output y. The difference is that I'm coding in C, not C++ and I know for sure that there is only one bit set in my input so I'm wondering if there is a more efficient ways to solve this.

Here is my try:

unsigned int get_power_of_two(unsigned int x)
{
    unsigned int y=0;
    for(unsigned int input=x; input>0; input=input>>1)
    {
        if(input & 1U)
        {
            break;
        }
        y++;
    }
    return y;
}

What is this efficiency compared to the proposed lookup table in @Dave's answer ? (again, I'm coding in C so I don't have the iterators functions like lower_bound)

Community
  • 1
  • 1
n0p
  • 3,399
  • 2
  • 29
  • 50

5 Answers5

2

The efficiency of your algorithm is O(log x), whereas Dave's (which performs binary search on powers of two) is O(log log x). So his is asymptotically faster.

The fastest approach, of course, is to use the BSF instruction.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • I don't know `BSF` / `BSR` but I guess it is not available on all hardwares right ? – n0p Mar 06 '14 at 11:16
  • It's an i386 instruction. Similar instructions are available on other architectures. – Sneftel Mar 06 '14 at 11:17
  • 1
    TL;DR: All architectures have a leading zero count instruction but C and C++ have never standardized it. It was proposed recently, for C++, which might adopt it as soon as 2017. – Potatoswatter Mar 06 '14 at 11:21
  • While likely better than a linear search, it's probably worth noting here that it means very little to talk about asymptotic complexity when n is fixed to the size of your int in bits. Any solution should be tested for speed, things like branch prediction and raw instruction count are going to play a role here. In Dave's answer the binary search could likely (but not certainly) be done faster with a binary search using a mask with all bits set past half way in the mask then shift that back and forth. And of course there are other efficient methods not using binary search. – Apriori Mar 06 '14 at 16:14
2

As a side-note, you should consider renaming function get_power_of_two to get_log_two.

If you are calling this function very often, then you can initialize a relatively small look-up table.

Using this table, you can check every input number, byte by byte, as follows:

#include <limits.h>

static unsigned int table[1<<CHAR_BIT];

void init_table() // should be called once
{
    for (unsigned int n=0; n<CHAR_BIT; n++)
        table[1<<n] = n;
}

unsigned int get_log_two(unsigned int x)
{
    for (unsigned int n=0; x>0; n+=CHAR_BIT, x>>=CHAR_BIT)
    {
        unsigned int y = x & ((1<<CHAR_BIT)-1);
        if (y > 0)
            return n+table[y];
    }
    return ~0; // will never be reached during runtime
}

This is not necessarily the most efficient method in terms of "pure academic complexity", as function get_log_two does not perform a binary-search.

Nevertheless, considering the relatively small value of sizeof(unsigned int) on any platform (usually 4), it will pretty much yield the same performance for average as well as worst-case scenarios.

barak manos
  • 29,648
  • 10
  • 62
  • 114
2

Apart from ways previously said by others, such as BSF or CLZ instructions which strictly depend on the underlying ISA, there are some other ways such as:

http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious

in fact here you could find a lot of 'bit twiddling hacks' there.

Hayri Uğur Koltuk
  • 2,970
  • 4
  • 31
  • 60
1

Here is a funny idea...

If you know there is only 1 bit set, then why not use a switch?

unsigned int get_log_two(unsigned int x)
{
        switch(x)
        {
            case 1<<0: return 0;
            case 1<<1: return 1;
            case 1<<2: return 2;
            case 1<<3: return 3;
            case 1<<4: return 4;
            case 1<<5: return 5;
            case 1<<6: return 6;
            case 1<<7: return 7;
            case 1<<8: return 8;
            case 1<<9: return 9;
            case 1<<10: return 10;
            case 1<<11: return 11;
            case 1<<12: return 12;
            case 1<<13: return 13;
            case 1<<14: return 14;
            case 1<<15: return 15;
        }
        return 0;
}

Extend this to 31, and you'll have a good function. This should be fast; but will only work if there is only one bit set.

user1574047
  • 63
  • 1
  • 6
  • It should be faster indeed comparing to the loop because in your case you don't have to increment iterator. But I guess it is still a O(log X) operation. – n0p Mar 06 '14 at 13:30
  • This is a constant-time method, which results is "order 1": O(1) – user1574047 Mar 06 '14 at 14:09
  • I think this is not O(1). you have to do some number of comparisons (depending on input) until you get to the result. – Hayri Uğur Koltuk Mar 06 '14 at 15:29
  • 2
    Of course this is O(1) because there are a constant number of cases; this will do no more than 16 comparisons, and 16 is a constant, so that's O(1). But in general there is no reason to believe that as the number of cases increases, that the time taken to run the code remains constant! Switches are not magical. Think about it this way: suppose you were writing this code in a language that does not have switches -- because **that's what the compiler has to do**. How would you write it to be O(1) as the number of cases increased? – Eric Lippert Mar 06 '14 at 16:26
  • Thanks @EricLippert I never actually thought about what a switch does internally. So I had a look at the disassembly, and did a few benchmarks. 1) I saw the switch uses a repeated store command ('rep stosd') of all the labels, thus should take longer for label element increases. 2) Extending the switch from 16 to 32 statements took about 2% longer to execute, not really significant (unexpected); however the time to execute the loop code from 16 to 32 increases about 40% (expected). Once again, thank you this was really interesting :-) – user1574047 Mar 07 '14 at 11:17
  • @user1574047: You're welcome. What a good compiler typically does is: if the switch cases are tightly clustered then it builds a jump table, which really is O(1). If they are not then it builds a binary search, which is O(lg n), or a linear search, which is O(n). Because a binary search involves more logic, there can be cases where a linear search is actually faster. The hard part is when there are parts of the switch that are clustered and parts that are spread out. A common technique is to build a binary search that figures out which of a series of jump tables to use. – Eric Lippert Mar 07 '14 at 14:17
1

In your case since you know only one bit is set, so it's enough to count trailing zeros. This can be done without a hardware instruction very quickly. Check out this answer, that's where the code below comes from (I'm not one to tamper with perfection... sometimes).

unsigned v;  // this is the number with one bit set
unsigned r;  // this becomes the exponent in v == pow(2, r)
static const unsigned MultiplyDeBruijnBitPosition[32] = 
{
  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 
  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((unsigned)((v & -v) * 0x077CB531U)) >> 27];

In your case since v only has one bit set, so you don't need to find the lowest bit set; therefore you can skip v & -v. And your version of the code becomes this:

unsigned v;  // this is the number with one bit set
unsigned r;  // this becomes the exponent in v == pow(2, r)
static const unsigned MultiplyDeBruijnBitPosition[32] = 
{
  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 
  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[(v * 0x077CB531U) >> 27];

See the link for more information, it in turn links to it's source information.

Community
  • 1
  • 1
Apriori
  • 2,308
  • 15
  • 16
  • Thanks, I've read this bithack as the link has been given above, but I was struggling to understand and adapt it to my specific case: this seems high level optimization ! – n0p Mar 06 '14 at 16:39
  • @Coconop, Multiplication bit hacks are the something of a black magic. IMO the easiest way to understand it is: the multiplication produces a unique value for each one-bit-set value in the most significant 5 bits, that when shifted to the least significant bit values is in the range [0, 31]. Research was done to to find this magic factor, and its not so important why this number happens to have this property, just that it does, and it makes sense why such numbers exist. Then it's just a matter of making a look-up table to match whatever of power of two that maps to. – Apriori Mar 06 '14 at 17:52
  • @Coconop, The (v & -v) part that was eliminated was to transform a number with any value to one with only one bit set in the position of the least significant bit set of the original number. By two's complement negation, the trick is equivalent to (v & (~v + 1)). If you think about these operation on a binary number you will see why this works the way it does. The part above the bit set is a bitwise NOT and so the AND "cancels" it to zero, and the part below is all ones until 1 is added to it rolling over to be a 1 in the place of the bit set, and 0's below. – Apriori Mar 06 '14 at 17:58
  • @Coconop, this should be very fast. It's three instructions (two operations and a memory read). So you understand the code now and how it's adapted to your special case? – Apriori Mar 06 '14 at 18:12