Optimization of detecting integer with only 1 bit set

Question

I have a function:

inline uint32_t ShiftOf(uint32_t v)
{
    for (uint32_t s = 0; s < 32; ++s)
    {
        if (v == 1 << s)
            return s;
    }
    return -1;
}

Is there any way to optimize it?

@PeteBecker he probably meant "complete" log2: if log2(v) is integer then return it, else return -1. — jingyu9575, Nov 04 '15 at 13:18
@LưuVĩnhPhúc That question is asking for a way to check *if* any single bit is set, ignoring position. This question is asking for a way to determine *which* bit is set, apparently assuming only a single bit is set in the input. Related, but not duplicate. — Bob, Nov 04 '15 at 15:38
@Bob the question is about "detecting integer with only 1 bit set", not which bit is set, although I didn't read the code. And even if it's about asking which bit is set, there are other duplicates as well: [Determine which single bit in the byte is set](http://stackoverflow.com/q/14429661/995714), [How to find the position of the only-set-bit in a 64-bit value using bit manipulation efficiently?](http://stackoverflow.com/q/32339078/995714) — phuclv, Nov 04 '15 at 15:43
@LưuVĩnhPhúc Someone else edited the title; that was not the original title. Those other two questions you linked are much closer, and many (but not all!) of those answers can be adapted with some effort, but not identical (when playing with bit manipulation, the width of the input is significant. One wants a byte, the other wants a 64-bit value, this question wants a `uint32_t`, a 32-bit input). — Bob, Nov 04 '15 at 15:49
for 32-bit case: [Position of least significant bit that is set](http://stackoverflow.com/q/757059/995714) — phuclv, Nov 04 '15 at 15:51

score 25 · Answer 1 · answered Nov 04 '15 at 13:10

If you only need to detect if exactly one bit is set, not which one, you can optimize this significantly:

int is_power_of_two(uint32_t v) {
    return v && !(v & (v - 1));
}

If you need to actually compute which bit is set, not just whether a single bit is set, you have a plethora of options (or you just cheat and use C99's log2 function after verifying it's a power of two and cast the result).

Short answer: Bookmark Bit Twiddling Hacks. It's handy.

score 22 · Accepted Answer · edited Nov 04 '15 at 12:25

There are some ways to optimize the functions.

Optimization with bitwise operators:

inline uint32_t ShiftOf(uint32_t v)
{
    uint32_t s =
        (bool)(v & 0xFFFF0000) * 16 +
        (bool)(v & 0xFF00FF00) * 8 +
        (bool)(v & 0xF0F0F0F0) * 4 +
        (bool)(v & 0xCCCCCCCC) * 2 +
        (bool)(v & 0xAAAAAAAA);
    return v == 1 << s ? s : -1;
}

Optimization with compiler intrinsics:

inline uint32_t ShiftOf(uint32_t v)
{
#if defined(_MSC_VER)
    DWORD s = 0;
    if (!_BitScanForward(&s, v))
        return -1;
#elif defined(__GNUC__)
    uint32_t s = __builtin_ctz(v);
#else
#    error This platform is unsupported!
#endif
    return v == 1 << s ? s : -1;
}

Optimization with a hash table:

const uint32_t g_divider = 37;
uint32_t g_hash[g_divider] = { 0 };
static void InitHash()
{
    for (uint32_t s = 0; s < 32; ++s)
        g_hash[(1 << s) % g_divider] = s;
}

inline uint32_t ShiftOf(uint32_t v)
{
    uint32_t s = g_hash[v % g_divider];
    return v == 1 << s ? s : -1;
}

I would generally recommend compiler intrinsic, as they are higher-level and thus (in general) more likely to generate optimized code for the target. — Matthieu M., Nov 04 '15 at 18:18

score 8 · Answer 3 · answered Nov 04 '15 at 10:32

I'm not sure but you can unroll the loop:

inline uint32_t ShiftOf(uint32_t v)
{
    switch (v)
    {
    case 0x00000001: return 0;
    case 0x00000002: return 1;
    case 0x00000004: return 2;
    case 0x00000008: return 3;
    case 0x00000010: return 4;
    case 0x00000020: return 5;
    case 0x00000040: return 6;
    case 0x00000080: return 7;
    case 0x00000100: return 8;
    case 0x00000200: return 9;
    case 0x00000400: return 10;
    case 0x00000800: return 11;
    case 0x00001000: return 12;
    case 0x00002000: return 13;
    case 0x00004000: return 14;
    case 0x00008000: return 15;
    case 0x00010000: return 16;
    case 0x00020000: return 17;
    case 0x00040000: return 18;
    case 0x00080000: return 19;
    case 0x00100000: return 20;
    case 0x00200000: return 21;
    case 0x00400000: return 22;
    case 0x00800000: return 23;
    case 0x01000000: return 24;
    case 0x02000000: return 25;
    case 0x04000000: return 26;
    case 0x08000000: return 27;
    case 0x10000000: return 28;
    case 0x20000000: return 29;
    case 0x40000000: return 30;
    case 0x80000000: return 31;
    default: return -1;
    }
}

Isn't there a `gcc`|`clang` flag for this? `-funroll-loops` or something? — Blacklight Shining, Nov 04 '15 at 13:04
@BlacklightShining: This isn't a loop. Though I suppose unrolling a loop that tested each bit position in turn might produce a similar result in a clever compiler with optimizations turned up. — ShadowRanger, Nov 04 '15 at 13:12
Sorry, downvoted this answer by mistake. If you edit the answer that will remove the lock and I can fix it. — Pete Becker, Nov 04 '15 at 18:29
@ShadowRanger There's a loop in the question; a mention of unrolling said loop is made at the beginning of this answer. — Blacklight Shining, Nov 05 '15 at 06:43

BrunoLevy · Answer 4 · 2015-11-04T18:33:42.127

4

If you want maximum speed (at the expense of less code portability), modern processors have optimized instructions for that, that are exposed through different "intrinsic function names" depending on the compiler.

The most common name for this operation is BSR (Bit Scan Reverse = find the index of the most significant bit). Under MSVC, it can be generated with the intrinsic functions _BitScanReverse, resp. _BitScanReverse64 (that take an additional mask argument)

More references on the following webpage: https://en.wikipedia.org/wiki/Find_first_set

edited Nov 04 '15 at 18:33

answered Nov 04 '15 at 10:46

BrunoLevy

2,495
17
30

Whilst not a bad advice per se, link-only answers are not appreciated. Could you please sum-up the article you linked to? Maybe citing the most common compilers? – Matthieu M. Nov 04 '15 at 18:16
@Matthieu M., good point (I edited the post), thanks. – BrunoLevy Nov 04 '15 at 18:34

score 1 · Answer 5 · answered Nov 04 '15 at 19:21

You can count the number of bits set and see if it's 1.

Consult the Fast Bit Counting page and you'll find many different routines for counting the number of bits quickly.

The fastest there is 4b. Precompute-16bit (though this is in C):

static char bits_in_16bits [0x1u << 16];

int bitcount (unsigned int n)  {
   // works only for 32-bit ints

   return bits_in_16bits [n         & 0xffffu]
       +  bits_in_16bits [(n >> 16) & 0xffffu];
}

Precompute_16bit is a variant of Precompute_8bit in that an array bits_in_16bits[] stores the number of 1 bits in successive 16 bit numbers (shorts).

But this version takes a fair amount of memory. You can experiment with the various versions to find one that suits your needs.

Optimization of detecting integer with only 1 bit set

5 Answers5