I have a function:
inline uint32_t ShiftOf(uint32_t v)
{
for (uint32_t s = 0; s < 32; ++s)
{
if (v == 1 << s)
return s;
}
return -1;
}
Is there any way to optimize it?
I have a function:
inline uint32_t ShiftOf(uint32_t v)
{
for (uint32_t s = 0; s < 32; ++s)
{
if (v == 1 << s)
return s;
}
return -1;
}
Is there any way to optimize it?
If you only need to detect if exactly one bit is set, not which one, you can optimize this significantly:
int is_power_of_two(uint32_t v) {
return v && !(v & (v - 1));
}
If you need to actually compute which bit is set, not just whether a single bit is set, you have a plethora of options (or you just cheat and use C99's log2
function after verifying it's a power of two and cast the result).
Short answer: Bookmark Bit Twiddling Hacks. It's handy.
There are some ways to optimize the functions.
Optimization with bitwise operators:
inline uint32_t ShiftOf(uint32_t v)
{
uint32_t s =
(bool)(v & 0xFFFF0000) * 16 +
(bool)(v & 0xFF00FF00) * 8 +
(bool)(v & 0xF0F0F0F0) * 4 +
(bool)(v & 0xCCCCCCCC) * 2 +
(bool)(v & 0xAAAAAAAA);
return v == 1 << s ? s : -1;
}
Optimization with compiler intrinsics:
inline uint32_t ShiftOf(uint32_t v)
{
#if defined(_MSC_VER)
DWORD s = 0;
if (!_BitScanForward(&s, v))
return -1;
#elif defined(__GNUC__)
uint32_t s = __builtin_ctz(v);
#else
# error This platform is unsupported!
#endif
return v == 1 << s ? s : -1;
}
Optimization with a hash table:
const uint32_t g_divider = 37;
uint32_t g_hash[g_divider] = { 0 };
static void InitHash()
{
for (uint32_t s = 0; s < 32; ++s)
g_hash[(1 << s) % g_divider] = s;
}
inline uint32_t ShiftOf(uint32_t v)
{
uint32_t s = g_hash[v % g_divider];
return v == 1 << s ? s : -1;
}
I'm not sure but you can unroll the loop:
inline uint32_t ShiftOf(uint32_t v)
{
switch (v)
{
case 0x00000001: return 0;
case 0x00000002: return 1;
case 0x00000004: return 2;
case 0x00000008: return 3;
case 0x00000010: return 4;
case 0x00000020: return 5;
case 0x00000040: return 6;
case 0x00000080: return 7;
case 0x00000100: return 8;
case 0x00000200: return 9;
case 0x00000400: return 10;
case 0x00000800: return 11;
case 0x00001000: return 12;
case 0x00002000: return 13;
case 0x00004000: return 14;
case 0x00008000: return 15;
case 0x00010000: return 16;
case 0x00020000: return 17;
case 0x00040000: return 18;
case 0x00080000: return 19;
case 0x00100000: return 20;
case 0x00200000: return 21;
case 0x00400000: return 22;
case 0x00800000: return 23;
case 0x01000000: return 24;
case 0x02000000: return 25;
case 0x04000000: return 26;
case 0x08000000: return 27;
case 0x10000000: return 28;
case 0x20000000: return 29;
case 0x40000000: return 30;
case 0x80000000: return 31;
default: return -1;
}
}
If you want maximum speed (at the expense of less code portability), modern processors have optimized instructions for that, that are exposed through different "intrinsic function names" depending on the compiler.
The most common name for this operation is BSR (Bit Scan Reverse = find the index of the most significant bit). Under MSVC, it can be generated with the intrinsic functions _BitScanReverse, resp. _BitScanReverse64 (that take an additional mask argument)
More references on the following webpage: https://en.wikipedia.org/wiki/Find_first_set
You can count the number of bits set and see if it's 1.
Consult the Fast Bit Counting page and you'll find many different routines for counting the number of bits quickly.
The fastest there is 4b. Precompute-16bit (though this is in C):
static char bits_in_16bits [0x1u << 16];
int bitcount (unsigned int n) {
// works only for 32-bit ints
return bits_in_16bits [n & 0xffffu]
+ bits_in_16bits [(n >> 16) & 0xffffu];
}
Precompute_16bit is a variant of Precompute_8bit in that an array bits_in_16bits[] stores the number of 1 bits in successive 16 bit numbers (shorts).
But this version takes a fair amount of memory. You can experiment with the various versions to find one that suits your needs.