0

I'm trying to learn Numba to speed up my Python code, and I plan to use the eager compilation mode of Numba JIT.

The common-used JIT mode can be achieved in the following code:

@jit(nopython=True)
def jit_go_fast_lazy(a):
    trace = 0.0
    for i in range(a.shape[0]):
        trace += np.tanh(a[i, i])
    return a + trace

When run the program above, the first-time execution of jit_go_fast() will be slow since compilation takes much time, however the afterwards executions will be tremendously faster.

Since my program is sensitive to execution time, it's essential to reduce the overhead compilation time so I plan to choose the eager compilation mode, which, according to official tutorial, should be like this:

@jit(float64[:, :](int64[:, :]), nopython=True)
def jit_go_fast_eager(a):
    trace = 0.0
    for i in range(a.shape[0]):
        trace += np.tanh(a[i, i])
    return a + trace

(The code example here is modified from Numba official site)

When testing the running time of code above, I use the following code:

x = np.arange(100).reshape(10, 10)

start = time.time()
jit_go_fast_lazy(x)
end = time.time()
print("Elapsed = %s" % (end - start))

start = time.time()
jit_go_fast_lazy(x)
end = time.time()
print("Elapsed = %s" % (end - start))

start = time.time()
jit_go_fast_eager(x)
end = time.time()
print("Elapsed = %s" % (end - start))

start = time.time()
jit_go_fast_eager(x)
end = time.time()
print("Elapsed = %s" % (end - start))

When running this test on macOS with Python 3.9.1, the results are as follows:

First run of JIT lazy compilation: 0.14231 sec
Second run of JIT lazy compilation: 3.0994e-06 sec

First run of JIT eager compilation: 8.8930e-05 sec
Second run of JIT eager compilation: 1.9073e-06 sec

From what I understamd, the running time of jit_go_faster_eager() should be approximately the same whether I run it in my program for the first time or not (since it has been compiled). However, the result shows that when jit_go_faster_eager() was executed for the second time, it was about 10 times faster.

I did some search about this confusing result, and find this question. I changed the decorator of jit_go_fast_eager() to:

@jit(float64[:, ::1](int64[:, ::1]), nopython=True)

The result is:

First run of JIT lazy compilation: 0.14407 sec
Second run of JIT lazy compilation: 3.0994e-06 sec

First run of JIT eager compilation: 6.0081e-05 sec
Second run of JIT eager compilation: 2.1458e-06 sec

There is only minor change in running time.

Can someone kindly give some help on this? My main questions are:

  • Why there's huge running time difference in the first and second execution of JIT eager compiled function?
  • What's the best practice to accelerate a function like this? While the second executions are always much faster, the first execution can interfere with the overall process of the progarm.

Thanks!

Lang Zhou
  • 53
  • 1
  • 3
  • 7
  • 1
    Check the function signature of your first function. There the arrays are c-contiguous. You are declaring non-contigous arrays in your second function. `@jit(float64[:, ::1](int64[:, ::1]), nopython=True)` should be equivivalent to your first implementation. Also try fastmath=True on things like summation. If the program is sensitive to execution time chache=True makes more sense anyway. The first load from cache takes a bit longer. For benchmarking use two independent functions and benchmark the second or third one) – max9111 May 17 '21 at 11:53
  • If you need to call a JIT function at the granularity of 1e-6, there is obviously a problem in the method. Numba is not designed for high-latency, nor Python. If you want to reduce the overhead, you need to use the Numba JIT in the caller function. See [this question](https://stackoverflow.com/questions/67520285) for more information. – Jérôme Richard May 17 '21 at 12:13

0 Answers0