3

I am trying to find an optimal code to locate a single bit index in a long integer (64 bit). The long integer has exactly one set bit. (Using C language)

Currently, I am just shifting the whole thing by one bit, then checking for zero. I have read about lookup tables, but it won't do for the whole 64 bits. I thought about checking each 8 bits for zero, if not use a lookup, but still I'll have to shift by 8 at a time. (Shifting by 8 is better than shifting by one 8 times?)

(Note: I am developing for mobile devices, and they are [not surprisingly] slow).

Mazyod
  • 22,319
  • 10
  • 92
  • 157
  • 3
    Why are you trying to find the optimal way? Shifting bits works just fine — don't over-complicate it unless you can measure a performance problem. – Andrew Feb 21 '13 at 08:51
  • Ditto to Andrew. if you're that concerned about performance *and* you've somehow deduced this is the bottle-kneck, break on the inline assembler. Until then, don't worry about it. – WhozCraig Feb 21 '13 at 08:52
  • I can measure it, but I need a better algorithm to compare. I happen to be working on a game logic, and the code executes more than 1000 times in a fraction of a second. – Mazyod Feb 21 '13 at 08:54
  • It doesn't matter how slow this algorithm is compared to other ones - if your game only spends 0.1% of it's time in this function, any gains are worthless anyways. – charliehorse55 Feb 21 '13 at 08:56
  • 1
    Current processors actually have hardware support for this type of operation. But you'll need to use compiler extensions to access them. – Mysticial Feb 21 '13 at 09:00
  • @charliehorse55 yeah, I understand, but according to my calculations, by optimizing the bit count and bit index operations, and even the memory allocation, I will at least get a notable boost. – Mazyod Feb 21 '13 at 09:02

3 Answers3

5

Whenever I need some way to manipulate bits, I always look for Bit Twiddling Hacks. It has few solutions for your problem as well.

This solution seems to be fast and most advanced:

Count the consecutive zero bits (trailing) on the right in parallel

unsigned int v;      // 32-bit word input to count zero bits on right
unsigned int c = 32; // c will be the number of zero bits on the right
v &= -signed(v);
if (v) c--;
if (v & 0x0000FFFF) c -= 16;
if (v & 0x00FF00FF) c -= 8;
if (v & 0x0F0F0F0F) c -= 4;
if (v & 0x33333333) c -= 2;
if (v & 0x55555555) c -= 1;

The number of operations is at most 3 * lg(N) + 4, roughly, for N bit words.

mvp
  • 111,019
  • 13
  • 122
  • 148
  • What is the `signed()` function meant to do? That's neither clear here nor in the hacks page. – paxdiablo Feb 21 '13 at 11:19
  • 1
    This is used to cast unsigned number v as signed. It could have been `-(signed)v` instead. – mvp Feb 21 '13 at 16:38
5

You can do a binary search for the bit that's set:

int bitn(unsigned long long x)
{
    int n = 0;

    if (x >> 32) {
        n += 32;
        x >>= 32;
    }
    if (x >> 16) {
        n += 16;
        x >>= 16;
    }
    if (x >> 8) {
        n += 8;
        x >>= 8;
    }
    if (x >> 4) {
        n += 4;
        x >>= 4;
    }
    if (x >> 2) {
        n += 2;
        x >>= 2;
    }
    if (x >> 1) {
        n += 1;
    }

    return n;
}

GCC provides a builtin, __builtin_ctzll(), that performs this function (which will take advantage of any special functionality that the hardware might have to do this quickly).

caf
  • 233,326
  • 40
  • 323
  • 462
4

You should check this suggestion (and any others given here) against your current code - you may find that the bitshifting is the most efficient way, or that the difference is minuscule in which case you should optimise for readability.

In any case, consider this something to try and benchmark rather than something that's guaranteed to be faster.

Since there are only 64 possible values, you can just use something like:

int getSetBit (unsigned long x) {
    if (x == 0x8000000000000000UL) return 63;
    if (x == 0x4000000000000000UL) return 62;
    if (x == 0x2000000000000000UL) return 61;
    if (x == 0x1000000000000000UL) return 60;
    if (x == 0x0800000000000000UL) return 59;
    if (x == 0x0400000000000000UL) return 58;
    :
    if (x == 0x0000000000000002UL) return  1;
                                   return  0;
}

You may find that's faster but solutions will generally be affected by a whole lot of things outside the scope of the standard (optimisation strategies, data caching, pipelining and so forth).


If you're willing to move away from Standard C, many environments will have optimised things you can use, such as the gcc:

int __builtin_ffs (unsigned int x)
// Returns one plus the index of the least significant
//   1-bit of x, or if x is zero, returns zero.

Of course, with that one, you may have to split the long into two int types and check each one individually, something like (untested but you should get the general idea):

if (x < 0x80000000UL) return __builtin_ffs((unsigned int)x) - 1;
return __builtin_ffs((unsigned int)(x>>32)) -1 + 32;

Alternatively, the output from __builtin_clzl() can be manipulated to give you the bit position (it gives you the leading zero count) and it works with unsigned long already. You can see the gcc built-ins here.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Did you mean `==` instead of `=`? – Mysticial Feb 21 '13 at 08:56
  • 1
    @Mysticial: yes, you're absolutely right. I can't believe I've been coding in C since the late 70s and _still_ occasionally stuff things up. Maybe I'm getting senile :-) – paxdiablo Feb 21 '13 at 08:58
  • I have read somewhere that a switch is a bit faster, but that's not important. I'll try this approach and benchmark. – Mazyod Feb 21 '13 at 08:59
  • @Mazyod, try _any_ approach but the benchmarking bit is important - and it should be tested on the target platform in case there are differences. – paxdiablo Feb 21 '13 at 09:00
  • `__builtin_ctzl()` gives you the *trailing* zero count (as you'd expect from the name), which is actually what the OP wants directly. Although I would use `__builtin_ctzll()` as long long must be at least 64 bits. – caf Feb 21 '13 at 09:15
  • @caf, you're right, it should have been clzl, fixed now. I don't think you need the `ll` variant since the question states that longs are 64 bits on the given platform, and the `__builtin_clzl` takes unsigned long. Any attempt at portability is probably long gone out the window at this point :-) – paxdiablo Feb 21 '13 at 09:28