Modern processors are able to compute the log2 of an integer in only few cycles using specific low-level instructions (eg. bsr
on mainstream x86-64 processors). Based on this great previous post one can compute the log10 of an integer very quickly. The idea is to use a lookup table so to do the translation between the log2 and log10. Once the log10 has been computed, one can just use another lookup table to perform the division by 10 ** log10(number)
. However, non-constant 64-bit divisions are very expensive on almost all processors. An alternative solution is to use a switch
with all the possible case so the compiler can generate an efficient code for all cases and use a fast jump table. Indeed, a division by a constant can be optimized by the compiler in few fast instructions (ie. multiplications and shifts) that are much faster than a non-constant division. The resulting code is not very beautiful/simple though. Here it is:
#include <math.h>
#include <assert.h>
static inline unsigned int baseTwoDigits(unsigned long long x) {
return x ? 64 - __builtin_clzll(x) : 0;
}
static inline unsigned int baseTenDigits(unsigned long long x) {
static const unsigned char guess[65] = {
0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4,
5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 9,
10, 10, 10, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15,
15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19
};
static const unsigned long long tenToThe[] = {
1ull, 10ull, 100ull, 1000ull, 10000ull, 100000ull, 1000000ull, 10000000ull, 100000000ull,
1000000000ull, 10000000000ull, 100000000000ull, 1000000000000ull,
10000000000000ull, 100000000000000ull, 1000000000000000ull,
10000000000000000ull, 100000000000000000ull, 1000000000000000000ull,
10000000000000000000ull
};
unsigned int digits = guess[baseTwoDigits(x)];
return digits + (x >= tenToThe[digits]);
}
inline int optimized(unsigned long long number)
{
const unsigned int intLog = baseTenDigits(number);
switch(intLog)
{
case 0: return number;
case 1: return number;
case 2: return number;
case 3: return number;
case 4: return number / 10ull;
case 5: return number / 100ull;
case 6: return number / 1000ull;
case 7: return number / 10000ull;
case 8: return number / 100000ull;
case 9: return number / 1000000ull;
case 10: return number / 10000000ull;
case 11: return number / 100000000ull;
case 12: return number / 1000000000ull;
case 13: return number / 10000000000ull;
case 14: return number / 100000000000ull;
case 15: return number / 1000000000000ull;
case 16: return number / 10000000000000ull;
case 17: return number / 100000000000000ull;
case 18: return number / 1000000000000000ull;
case 19: return number / 10000000000000000ull;
case 20: return number / 100000000000000000ull;
default: assert(0); return 0;
}
}
Note that this code use the non-standard compiler built-in __builtin_clzll
available on both Clang and GCC. For MSVC, please read this post.
[Update] The previous benchmark did not inline the code of the proposed function as opposed to the others resulting in a slower execution (especially on Clang). Using static
+inline
helped the compiler to properly inline the function calls.
Results
Here is the results of the methods on QuickBench respectively on GCC and Clang (with -O3
). Note that the input distribution is chosen so that the logarithm are quite uniform and pseudo-random (ie. logarithm uniform). This choice as been made since the interviewer said a binary-search was a good solution and this distribution is a best one for such algorithm.


One can see that this solution is the fastest. The one of Yakov Khodorkovski and qqNade give wrong results for big values due to floating point rounding. The one of qqNade is not presented in the benchmark for sake of clarity as it is >10 time slower than the original one.
The reason why the solution of Gerhardh is so fast with Clang is that the compiler is able to partially generate fast conditional moves instead of slow conditional branches. This optimization is insanely clever since this is only possible on 32-bit integers (and also if the optimization about the division by a constant is performed before), but Clang is able to know that n
is small enough after the 2 first conditions! That being said, this optimization is fragile since few changes in the code often appears to break it.
One can note that the original code is surprisingly quite fast (especially on GCC). This is due to branch prediction. Modern processors execute many iterations without checking they should be executed (and roll back if needed). Each iteration is very fast since the division by a constant is optimized: it only takes 2 cycle/iterations on my machine. On modern x86-64 Intel processors a branch misprediction takes 14 cycles while a well-predicted branch takes only 1-2 cycles (similar on AMD Zen processors). The average number of iteration is ~9 and only the last iteration is expensive. The solution of Gerhardh results in far less instructions executed, but it can result in up to 4 miss-predicted branch with GCC and up to 2 with Clang. The proposed solution results in only 1 indirect miss-predicted branch (that processors can less easily execute efficiently though). Since the optimized implementations runs in only ~10 cycles in average on Quickbench, the effect of branch misprediction is huge.
Note that using other input distribution have an impact on results though the overall trend remains the same. Here are results for a uniform distribution: GCC and Clang. The original algorithm is significantly slower since the average number of digits is twice bigger (17~18 instead of ~9) so is the number of iterations. The speed of the other algorithms is not very different compared to the previous distribution and the trend is left unchanged overall for them.
Conclusion
To conclude, the solution of Gerhardh is relatively portable, simple and pretty fast. The new proposed solution is more complex, but it is the fastest on both GCC and Clang. Thus, the solution of Gerhardh should be preferred unless the performance of this function is very important.