Find algorithm to base 2 of number without math module

Question

What is the most efficient way to find the logarithm to base 2 of a given number without using math module?

math.log and math.log2 are float operations therefore have inherent accuracy problems and may give false results, but they are very fast.

The number is guaranteed to be a power of 2, the function will only be executed if the following function returns True:

def power_of_2(n):
    return (n & (n-1) == 0) and n != 0

I want the result to be 100% accurate and I have already found a way to achieve this:

def log2(n):
    i = 0
    while n > 1:
        n >>= 1
        i += 1
    return i

performance:

In [244]: def log2(n):
     ...:     i = 0
     ...:     while n > 1:
     ...:         n >>= 1
     ...:         i += 1
     ...:     return i

In [245]: log2(33554432)
Out[245]: 25

In [246]: import math

In [247]: math.log2(33554432)
Out[247]: 25.0

In [248]: %timeit log2(33554432)
2.37 µs ± 208 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [249]: %timeit math.log2(33554432)
125 ns ± 1.34 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

I found bit manipulation more suitable than float operations for this task, how can I improve the performance of the function while using the same logic?

You can find the answer here https://stackoverflow.com/questions/13211137/get-logarithm-without-math-log-python — I_Al-thamary, Aug 29 '21 at 07:36
@I_Al-thamary Are you sure "the most efficient way" is there? — no comment, Aug 29 '21 at 07:44
`n >>= 1` is equal to `n //= 2`, so you're basically divide number until it becomes zero. If you're sure that number is power of two, you may use next "hack" `return n.bit_length() - 1`. It works, because in binary representation any power of 2 is `1` with certain amount of `0` in the tail. `2⁰ = 0b1`, `2¹ = 0b10`. `2² = 0b100`, ... , `2¹⁰ = 0b10000000000`, ... — Olvin Roght, Aug 29 '21 at 07:45

I_Al-thamary · Accepted Answer · 2021-08-29T09:15:47.640

I have tried to run most of the methods from different answers from 1 and 2 where I found that power_of_2 and easy show a good performance.

import math
import timeit

def myLog(x, b):
    if x < b:
        return 0
    return 1 + float(myLog(x / b, b))


def easy(x):
    return x.bit_length() - 1


def brute(x):
    # determine max p such that 2^p <= x
    p = 0
    while 2 ** p <= x:
        p += 1
    return p - 1


def log2_approx(val):
    from math import floor
    val = floor(val)
    approx = 0
    while val != 0:
        val &= ~ (1 << approx)
        approx += 1
    return approx


def log2(n):
    i = 0
    while n > 1:
        n >>= 1
        i += 1
    return i


def power_of_2(n):
    return (n & (n - 1) == 0) and n != 0

def log2_fast(n):
    return  math.frexp(33554436)[1] - 1
import time
from functools import partial
 
import time

start_time = time.time()
math.log2(33554432)
print("math.log2 is %s microseconds ---" % ((time.time() - start_time) * 1000000))
start_time = time.time()
myLog(33554432.0, 2.0)
print(" myLog is %s microseconds ---" % ((time.time() - start_time) * 1000000))

start_time = time.time()
brute(33554432)
print(" brute is --- %s microseconds ---" % ((time.time() - start_time) * 1000000))
start_time = time.time()

log2_approx(33554432) - 1
print("log2_approx is --- %s microseconds ---" % ((time.time() - start_time) * 1000000))
start_time = time.time()
log2_fast = math.frexp(33554436)[1] - 1
print(" log2_fast is --- %s microseconds ---" % ((time.time() - start_time) * 1000000))

start_time = time.time()
log2(33554432)
print("log2 is --- %s microseconds ---" % ((time.time() - start_time) * 1000000))

start_time = time.time()
power_of_2(33554432)
print("power_of_2 is --- %s microseconds ---" % ((time.time() - start_time) * 1000000))
start_time = time.time()
easy(33554432)
print("easy is --- %s microseconds ---" % ((time.time() - start_time) * 1000000))

Running time

 math.log2 is 3.0994415283203125 microseconds ---
 myLog is 5.4836273193359375 microseconds ---
 brute is --- 6.4373016357421875 microseconds ---
 log2_approx is --- 6.4373016357421875 microseconds ---
 log2_fast is --- 1.6689300537109375 microseconds ---
 log2 is --- 2.1457672119140625 microseconds ---
 power_of_2 is --- 0.7152557373046875 microseconds ---
 easy is --- 0.476837158203125 microseconds ---

Using timeit shows delay

test_items=33554432
base=2
times = timeit.Timer(partial(math.log2, test_items))
print("math.log2 is %s microseconds ---", (times.timeit(1000000)))
times = timeit.Timer(partial(myLog, test_items,base))
print("myLog is %s microseconds ---", (times.timeit(1000000)))
times = timeit.Timer(partial(easy, test_items))
print("easy is %s microseconds ---", (times.timeit(1000000)))
times = timeit.Timer(partial(brute, test_items))
print("brute is %s microseconds ---", (times.timeit(1000000)))
times = timeit.Timer(partial(log2_approx, test_items))
print("log2_approx is %s microseconds ---", (times.timeit(1000000)))
times = timeit.Timer(partial(log2, test_items))
print("log2 is %s microseconds ---", (times.timeit(1000000)))
times = timeit.Timer(partial(power_of_2, test_items))
print("power_of_2 is %s microseconds ---", (times.timeit(1000000)))
times = timeit.Timer(partial(log2_fast, test_items))
print("log2_fast is %s microseconds ---", (times.timeit(1000000)))

Running time

math.log2 is %s microseconds --- 0.05126429593656212
myLog is %s microseconds --- 4.137887543998659
easy is %s microseconds --- 0.10356121498625726
brute is %s microseconds --- 5.254867412033491
log2_approx is %s microseconds --- 3.81522585300263
log2 is %s microseconds --- 1.7966924259671941
power_of_2 is %s microseconds --- 0.1572157460032031
log2_fast is %s microseconds --- 0.21886748599354178

`easy()` shows a better results because it is the only "native" approach ([look](https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Objects/longobject.c#L5255)) — Olvin Roght, Aug 29 '21 at 08:12
Benchmarks with single iteration are not representative. Consider using [`timeit`](https://docs.python.org/3/library/timeit.html). — Olvin Roght, Aug 29 '21 at 08:20

tripleee · Answer 2 · 2021-08-29T16:06:29.410

1

If your code needs to call this function repeatedly, and the inputs are guaranteed to be even integer powers of two (unlike your example!) I imagine it's going to be hard to beat a hard-coded dictionary lookup.

def precalculate_pow2(ceiling):
    i = 1
    result = dict()
    for n in range(ceiling):
        result[i] = n
        i <<= 1
    return result

powersof2 = precalculate_pow2(500) # or whatever your upper limit is

def log2(n):
    return powersof2[n]

Obviously, the loop to calculate the values is no more efficient than your current code, but memoizing it allows you to amortize that cost over all the calls to the log2 function.

As a variation, you could start with the dictionary containing only one value, and only populate the missing parts on subsequent calls, using a recursive function.

powersof2 = {1: 0}

def log2(n):
    if n not in powersof2:
        powersof2[n] = 1 + log2(n >> 1)
    return powersof2[n]

This has the obvious drawback that it will go into an endless loop if you call it with a negative value (though Python will then throw a traceback once you exhaust the maximum recursion limit).

edited Aug 29 '21 at 16:06

answered Aug 29 '21 at 07:56

tripleee

175,061
34
275
318

Can you benchmark against `n.bit_length() - 1`? – no comment Aug 29 '21 at 08:04
@don'ttalkjustcode benchmark function call with dict lookup? Lookup will be faster for sure. – Olvin Roght Aug 29 '21 at 08:05
What benchmark makes sense will depend on the OP's use case. This is basically guaranteed to be slower for a single call, but will break even after a few dozen or so, I imagine. – tripleee Aug 29 '21 at 08:06
Why would the number of calls matter? – no comment Aug 29 '21 at 08:06
@don'ttalkjustcode, because it generates mapping on first "call" and just look for result later, – Olvin Roght Aug 29 '21 at 08:08
I'll grant you that precomputation for free :-). Just benchmark the lookup. – no comment Aug 29 '21 at 08:11
@don'ttalkjustcode, it will be faster. Significantly faster. – Olvin Roght Aug 29 '21 at 08:17
@OlvinRoght Which one? tripleee's? For how long, i.e., until what size? It's rather odd to unconditionally say that an O(log n) solution is faster than an O(1) solution :-) – no comment Aug 29 '21 at 08:25
@don'ttalkjustcode, triplee's solution is `O(1)`, because it's dictionary lookup. – Olvin Roght Aug 29 '21 at 08:27
1

@OlvinRoght No it's not. Don't forget that lookups need to confirm the hash match with an equality check. – no comment Aug 29 '21 at 08:28
@don'ttalkjustcode so you think that it will take more time than calling stack of function in `bit_length()`? Check [this](https://stackoverflow.com/q/1963507/10824407) question and take a look on [`bit_length()` implementation](https://github.com/python/cpython/blob/bb3e0c240bc60fe08d332ff5955d54197f79751c/Objects/longobject.c#L5255). – Olvin Roght Aug 29 '21 at 08:30
1

Another variation might be to use `functools.lru_cache()` then there's no need to precalculate a whole load of potentially unused values - you just pay the calculation once on first use. – Mark Setchell Aug 29 '21 at 08:31
1

@OlvinRoght At `n=2**499` (tripleee's chosen limit), the bit_length solution is already [more than twice as fast](https://tio.run/##ZVHRbsIwDHzPV1hCKE3XbRQ0aUzwJQihUlyI1CZR6gyhad/eOS1r0ZaXpuec7852N7pYs3p3vusqbxsg3aAm0I2znsCjw4KEaJGCgy1IKcUJK3Aey6IuQ10QHpy9LpMSda3NWX0I4KP5bd7fPLahJv496ZIS1WOV9WBAG/CFOeMf6kTa6T3zzIhq2GymthS8uT8Ugi2gb221ZMJ/b2@LhYIZsOr1wuAnerjZ4CE4x9daNzFwK8TMMH0JaQpf@C1iVtHaOpC2puXKrleWo9bO7GU2YOblqOlQoznTJVHwDDlX9twxRj1MUVccMkI4Qet1Bmwwg5xdDiPo1Sr56Eb2Bee1ocS00xjZX2w12pxmaEJz5HTb2JfPiMcFx/7DahNmZjCs94mFsztvO3zUSBuk00TOV6doUMIcEoLXX50Uclyr3hNFR72MyqIz9eBdia77AQ) – no comment Aug 29 '21 at 08:34
@OlvinRoght And at n=2**5000, it's already about 24 times (!) as fast as the dict lookup. What exactly am I supposed to see in that question about dict lookup complexity and in the bit_length implementation? – no comment Aug 29 '21 at 08:40
Sounds like you are still measuring a single call, where this very explicitly comes with the caveat that it will be faster only for a program which calls the function repeatedly. If you measure only after the lookup table is generated, I'm pretty sure this will always be faster. What would make sense is a benchmark where you loop over 1, 2, 4, 8, ... and measure at which point the lookup, *including* the overhead, outperforms your alternative. – tripleee Aug 29 '21 at 08:50
@tripleee I *never* measured just a single call. Are you confusing me with the author of the other answer? You can see in the code I linked to that I ran it 100000 times (and did that repeatedly). And that I, as granted earlier, precomputed your table *before* the benchmark. And it still lost. Please do look at the code and results that I linked to. – no comment Aug 29 '21 at 08:53
1

I can understand why it could be faster on larger amount. Basically, `bit_length(num)` ends with `sizeof(num) * 8 - __builtin_clz(num)` which is faster than any hash function. – Olvin Roght Aug 29 '21 at 09:05
@OlvinRoght Right, that's another reason that the dict lookup takes O(log n), not O(1). Though to see that one requires looking at the code. And a third reason that the size matters, this time the size of the *limit*, is that [the hash values repeat](https://tio.run/##NYxNCsIwEIX3OcXskpQiRRFEcOVBSgmpHWhmwmS68PQxAft27@d7@asb0@2RpdZVOEHgfY9BkakApsyi8OaDNIoxQQVep3XbUjZ3hWEA9LCyAAISyEKf6O7T5L3pYV/NY7ttUO/bxyVx0TlwSkzOPw00ZUFSZ/vajidkOYRDpNg/3xLFFIv1tf4A) and cause more and more collisions. – no comment Aug 29 '21 at 09:35
@don'ttalkjustcode, thanks for the lesson, not sure was it required though. But I think a disclaimer required, that precomputation is not faster in very rare cases. If your task can't be solved with 5 lines of assembly code, it's a good option. – Olvin Roght Aug 29 '21 at 10:57

Find algorithm to base 2 of number without math module

2 Answers2