0

I have to optimize a python code with cython. The code has a lot of vectors and matrices dot products, which is the key of the optimization procedure.

The products are in chain like : A.B.C.D.e where e can be a vector and A,B,C,D are matrices. I'm using numpy to represent all the objects.

Originally, I used numpy.linalg.multi_dot. I read here that multi_dot can be more efficient than np.dot because it tries to optimize operation order. However, in my case, the matrices are small and using multi_dot is about 10x slower than doing a chain of np.dot.

However, the syntax of multi_dot is nice and there are thousands of lines in the code doing products of different numbers of matrices and vectors. Re-writing all these lines using np.dot would really loose readability for other developers.

Creating a pure python equivalent of multi_dot based on np.dot was easy, but I'd like to Cythonize it. The fact that I don't know the number of matrices given as arguments of multi_dot at compilation time forces me to call the python interpretor. Here is the code I wrote:

@cython.boundscheck(False)
@cython.wraparound=False
cpdef mymultidot(np_array_list):
    cdef int n=len(np_array_list)
    R=np_array_list[0]
    for i in range(1,n):
        R=np.dot(R,np_array_list[i])

Is there a way to reduce python calls in this function? In particular, not deducing type of np_array_list at execution time?

Maybe deduce the type of np_array_list at compile time something like list_of ( np.ndarray[np.float_t, ndim=2] )? But not knowing the size of the list will make it difficult.

Is there a way in Cython to have methods with the same name but not the same arguments like in C/C++? So that I could create

mymultidot( np.ndarray[np.float_t, ndim=2], np.ndarray[np.float_t, ndim=2], np.ndarray[np.float_t, ndim=2])

mymultidot( np.ndarray[np.float_t, ndim=2], np.ndarray[np.float_t, ndim=2])

mymultidot( np.ndarray[np.float_t, ndim=2], np.ndarray[np.float_t, ndim=1])

... and so on?

It will be long to write all the versions existing in the code, but all multi_dot arguments would be known at compile time.

Thanks in advance.

  • How many matrices do you have, how big are they, and are they constant for many function calls? – Nils Werner Apr 14 '18 at 15:46
  • Also you should probably add `cdef int i` to your code. Typing your loop counters is pretty helpful. – CodeSurgeon Apr 14 '18 at 15:52
  • I wonder whether this will be very useful. A few (hundred) iterations on a complex task that can't be fully 'cythonized' (no Python calls) won't show a big speed increase. I'd suggest programming the simplest case (e.g. a bunch of identical arrays of known size), and see if it helps enough to do more work on it. – hpaulj Apr 14 '18 at 17:01
  • There isn't a way to do function overloading. There is a feature called fused types though, which partially covers the use case for function overloading. That said, since your loop basically just called numpy repeatedly I've very surprised to see any kind of substantial speedup, particularly for big matrices. – ngoldbaum Apr 14 '18 at 17:42
  • @NilsWerner there are typically 2 to 4 matrices and one vector for each function call (and there are a lot of function calls). Each matrix is small : 3x3. They are rotation matrices in 3d space. If I summarize the other comments, there is no easy way to do what I wanted and anyway it might not be worth the shot. – Vincent D. Apr 14 '18 at 17:56
  • Are you calling the multidot function in a loop (Function call overhead is really big on such small problems)? – max9111 Apr 16 '18 at 16:05
  • I'm calling the multi_dot function in a loop yes. I don't think the function call is really time consuming, but the function mymulti_dot has to infer the type of the argument at each function call. I was hopping to avoid that and create mymulti_dot where all the types are known at compile time. I think that would optimize the whole code. – Vincent D. Apr 22 '18 at 18:08

0 Answers0