Parallel programming in Cython

Question

from numpy.linalg import inv
from numpy import array
def test (A,U,m):
    Lambda=array(list(map(lambda x:inv(U)@A[x]@U,(range(m*m)))))

This is a simple code that calculates an array product. How can I right this in cython with parallel programming in the loops with prange? I tried many times but in order to do so, I need to use nogil for prange. But inv() needs the gil. How can this be done efficiently in order to be faster than my original code?

DavidW · Answer 1 · 2022-05-21T06:56:11.210

1

You rarely actually want to compute the inverse. inv(U)@A[x] is better computed as np.linalg.solve(U, A[x]). That is almost certainly quicker and more numerically accurate. From a quick test on my PC you might expect a 40% speedup from that alone.
You calculate the inverse inside a loop, but it does not depend on the looped variable. Consider taking the calculation of the inverse outside the loop and calculating it once! (or using solve...). You should fix this type of basic mistake before worrying about parallelizing it.
The last time you asked this question you were pointed towards a different question. Essentially Scipy (not Numpy) comes with wrappers for BLAS and LAPACK. You can cimport these to get direct, GIL-free access to these low-level matrix algebra functions.

I don't plan on giving a full worked example here (laziness...) but you can use dgesv for solve (assuming that both U and A[x] are double precision floating point matrices... you don't say) and dgemm for the matrix multiplication (again, assuming double-precision floating point matrices). If you insist on calculating the inverse then I think you need to go through its LU factorization.

These functions are obviously considerable more complex than just using the Numpy functions. To get the pointers to pass to them you'll probably want to take a contiguous memoryview of your array cdef double[:,::1] Umview = U and then get the pointer of the 0,0 element (&U[0,0]).

edited May 21 '22 at 06:56

answered May 20 '22 at 17:38

DavidW

29,336
6
55
86

3

Another point: the matrix multiplication (and probably the inverse) in efficient BLAS are already parallel so using parallelism on top of that will certainly not speed up the computation. This is the case for OpenBLAS which is the default implementation on both platforms. In fact, it should be slower. The number of threads can often be controlled but it is generally better to use the default parameter unless there are many tiny matrices to compute (which is not mentioned by the OP). Not to mention Numpy release the GIL internally. – Jérôme Richard May 20 '22 at 17:45
@JérômeRichard That's a very good point. I tried matrix multiplication with Numpy on my PC and it didn't look to run in parallel so I didn't include it in my answer. However, I'm sure there's things I could change to make it parallel though. Getting BLAS to work in parallel is definitely a much better solution than trying to write something yourself in Cython! – DavidW May 20 '22 at 17:48
Thanks for your answer @DavidW. The U matrix is a 2D complex matrix and A is a 3D complex. I have to calculate this product inv(U)@A[x]@U with U dimensions are nxn and A are m^2 x n x n. I did it in python code and I want to speed up with cython using parallel programming. Is this gonna speed up my code? Also can you give me an example on how to use dgesv and dgemm on 2D arrays? – ABZANMASTER May 21 '22 at 14:41
I think you're ignoring all the really basic easy suggestions (points 1&2, and Jérôme Richard's suggestion that it is probably already parallelized). For double complex matrices you want `zgesv` and `zgemm` instead. I can't give an example - using these functions is fairly complicated, it'd probably take me a couple of hours to get it right, and I don't really have the time or interest. – DavidW May 21 '22 at 15:10
@DavidW I do not want to parallel the matrix multiplication. I just want to compute in parallel the multiplication of inv(U)@A[x]@U. In every thread of my pc I want to calculate this product for n times. Did I not understand something correct? – ABZANMASTER May 22 '22 at 11:41
The point was that you don't gain anything from running `inv(U)@A[x]@U` in parallel if the `inv` and the matrix multiplications are already running in parallel. – DavidW May 22 '22 at 12:58

Parallel programming in Cython

1 Answers1