2

Attached a minimal example:

from numba import jit
import numba as nb
import numpy as np

@jit(nb.float64[:, :](nb.int32[:, :])) 
def go_fast(a): 
    trace = 0.0
    for i in range(a.shape[0]):  
        trace += np.tanh(a[i, i]) 
    return a + trace          

@jit 
def go_fast2(a): 
    trace = 0.0
    for i in range(a.shape[0]):  
        trace += np.tanh(a[i, i]) 
    return a + trace 

Running in Jupyter:

x = np.arange(10000).reshape(100, 100)
%timeit go_fast(x)
%timeit go_fast2(x)

leads to

5.65 µs ± 27.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

3.8 µs ± 46.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Why the eager compilations leads to a slower execution?

Gideon Kogan
  • 662
  • 4
  • 18

1 Answers1

4

Knowing that the memory accesses are contiguous simplifies the life of an optimizer (here is an example for Cython, but similar holds for numba, even if clang is often more clever than gcc).

Your example seems to be such a case:

  1. without "eager compilation" the numba will detect, that the data is C-contiguous and utilize it, e.g. for vectorization.
  2. with eager compilation, you don't provide this information, thus the optimizer must take into account, that the memory accesses could be non-contiguous and will create a jit-code which is less performant than the first version.

Thus, you should provide a more precise signature:

@jit(nb.float64[:, ::1](nb.int32[:, ::1])) 
def go_fast3(a): 
    trace = 0.0
    for i in range(a.shape[0]):  
        trace += np.tanh(a[i, i]) 
    return a + trace

[:,::1] tells numba, that the data will be C-contiguous and once this information is utilized:

x = np.arange(10000).astype(np.int32).reshape(100, 100)
%timeit go_fast(x)     # 15.6 µs ± 241 ns per loop
%timeit go_fast2(x)    # 8.2 µs ± 90.7 ns per loop
%timeit go_fast3(x)    # 8.2 µs ± 49.6 ns per loop

there is no difference for the eagerly compiled version.

ead
  • 32,758
  • 6
  • 90
  • 153
  • Where I can find the guideline for the syntex that you have used? – Gideon Kogan Mar 30 '21 at 05:33
  • 1
    @GideonKogan e.g. here: https://numba.pydata.org/numba-doc/dev/reference/types.html#arrays – ead Mar 30 '21 at 06:47
  • 1
    There are some things which could be mentioned additionally. If Numba detects non-contiguous arrays in jit-mode a function isn't recompiled if it is called with contiguous arrays after that. This is also a feature which often leads to confusion. – max9111 Mar 30 '21 at 11:57
  • You say "even if clang is more clever then gcc", but shouldn't you say "numba" or "LLVM" instead of "clang"? clang is not used here right? – Jens Renders Apr 04 '21 at 14:10
  • @JensRenders llvm would be the most correct. Numba/clang are frontends using llvm. With cython one uses clang frontend or gcc, this is the reason I used this pair. – ead Apr 04 '21 at 14:22
  • A okay i though the clang was referring to numba, my bad. – Jens Renders Apr 04 '21 at 14:23