3

I tried to count up, from any input number, by 1 billion using numba, to see how much slower it is than C code. I accidentally found that it takes (on my setup with Python 3.8) around 80 microseconds, not seconds. As you can see below, I tried to slow it down with a conditional and an extra step, but that had no effect. I also did not even input a signature for the datatype of the argument and/or the output.

As I understand it, the decorator @njit(cache=True) transforms the python code to LLVM binary machine code that is then cached to be run as needed when the decorated function is called.

The following function is so much faster than even 1 billion clock cycles. Is the function analyzed in depth and then safely transformed to equivalent faster functions?

from numba import njit

@njit(cache=True)
def countit(count):
    end = count + 1000000000
    while count < end:
        if count%2:
            count += 1
        else:
            count += 2
            count -= 1
    return count

I did make sure that the result wasn't cached when doing the timings. If you compile the function, and then run it just once with a brand new argument, it will finish in far less than a millisecond.

Larry Panozzo
  • 191
  • 2
  • 7
  • 3
    since count += 2; count -= 1 == count += 1 the loop can be avoided and you can simply return count + 1000000000 – Mitch Wheat Nov 04 '22 at 01:37
  • The question is, does numba or llvmlite do this analysis of the entire loop and compile an equivalent function that returns count + 1000000000? Is that what is saved in the cache? – Larry Panozzo Nov 04 '22 at 03:33
  • 3
    Yes, there is some dead code elimination on the LLVM side. Usually optimizations done by clang (-O3 -march=native) are also done by Numba. The LLVM-backend is the same... You can possibly see that if you have a look at the optimized LLVM-IR `print(countit.inspect_llvm(countit.signatures[0]))`. Also avoid including compilation times. – max9111 Nov 04 '22 at 09:05
  • If you see something like that https://github.com/numba/numba/issues/8172 (a lot slower than equivalent C using clang) it makes sense to think of a bug. – max9111 Nov 04 '22 at 11:31

0 Answers0