Best performance method when solving 7000x7000 linear system with python

Question

I’m in the need of an efficient method to inverse a 7000x7000 aerodynamic influence coefficient (dense) matrix in python. I had started before a FORTRAN routine to handle the problem using the LU decompostition routines from LAPACK, which I had seen being used in other related apps quite efficiently. I’ve read, though, that the NumPy and SciPy linear system solvers are mostly based in direct calls to the same LAPACK/BLAS functions in C, and was wondering wether switching to FORTRAN would really reduce computation time to a level which justifies giving up an easier, higher level language.

If there are python solvers which would guarantee similar performance for matrixes of that size (1000 to 10000, square), which are they?

I really need the matrix inverse, so switching to iterative Ax=b solutions is not an option.

My suspcision is that if you use `numpy` with the MKL Lapack that you aren't going to miss Fortran. But the best way to answer this is to profile your code and see if the performance is acceptable. — juanpa.arrivillaga, Dec 23 '19 at 19:35
I would first focus on setting up a fast BLAS/LAPACK backend. [Related: scipy wiki](https://github.com/scipy/scipy/wiki/Dropping-support-for-Accelerate#which-minimum-lapack-version-to-support) and [Roadmap](http://scipy.github.io/devdocs/roadmap.html#evolve-blas-and-lapack-support) — sascha, Dec 23 '19 at 19:36
why not using [`R`](https://www.r-project.org/). To get the inverse just use the `solve` function. With this example you get the inverse of a matrix: `cant <- 1000*1000` `m <- matrix(rexp(cant, rate=.1), ncol=sqrt(cant))` `n = solve(m)` — Cedric Zoppolo, Dec 23 '19 at 19:38
@CedricZoppolo the issue is still the same of which BLAS/LAPACK backend you are using. — juanpa.arrivillaga, Dec 23 '19 at 19:42

score 3 · Answer 1 · answered Dec 27 '19 at 18:42

Indeed, Numpy and Scipy efficiently call LAPACK routines to perform numpy.linalg.inv and scipy.linalg.inv.

To inverse a general matrix, numpy.linalg.inv solves A.x=np.eye((n,n)). The function inv() calls ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj), which calls call_@lapack_func@(&params); where params.B is the identitity matrix and @lapack_func@ is one of sgesv, dgesv, cgesv, zgesv, that are linear solvers for general matrices.

On the other hand, scipy.linalg.inv calls getri , defined as get_lapack_funcs(('getri'),(a1,)). It corresponds to the DGETRI() function of lapack, designed to computes the inverse of a matrix using the LU factorization computed by DGETRF(). Hence, if you are using DGETRI() in Fortran, making use of scipy.linalg.inv() in python likely enable similar performances and results.

Most of Lapack functions can be called using scipy.linalg.lapack. Here is an example making use of scipy.linalg.cython_lapack.dgetri() in a cython module: How to compile C extension for Python where C function uses LAPACK library? Here goes a sample code, comparing scipy.linalg.cython_lapack.dgetrf()+scipy.linalg.cython_lapack.dgetri() , numpy and scipy.linalg.inv() on a 1000x1000 matrix:

import numpy as np
from scipy import linalg
import time

import myinverse
n=1000
A=np.random.rand(n,n)

start= time.time()
Am,info,string=myinverse.invert(A.copy())
end= time.time()
print 'DGETRF+DGETRI, ', end-start, ' seconds'
if info==0:
    print 'residual ',np.linalg.norm(A.dot(Am)-np.identity(n), np.inf)
else :
    print "inversion failed, info=",info, string

start= time.time()
Am=np.linalg.inv(A.copy())
end= time.time()
print 'np.linalg.inv ', end-start, ' seconds'
print 'residual ', np.linalg.norm(A.dot(Am)-np.identity(n), np.inf)

start= time.time()
Am=linalg.inv(A.copy())
end= time.time()
print 'scipy.linalg.inv ', end-start, ' seconds'
print 'residual ',np.linalg.norm(A.dot(Am)-np.identity(n), np.inf)

And the output is:

DGETRF+DGETRI,  0.22541308403  seconds
residual  4.2155882951089296e-11
np.linalg.inv  0.29932808876  seconds
residual  4.371813154546711e-11
scipy.linalg.inv  0.298856973648  seconds
residual  9.110997546690758e-11

For 2000x2000 matrix:

DGETRF+DGETRI,  1.64830899239  seconds
residual  8.541625644634121e-10
np.linalg.inv  2.02795410156  seconds
residual  7.448244269611659e-10
scipy.linalg.inv  1.61937093735  seconds
residual  1.6453560233026243e-09

A Fortran code chaining DGETRF()+DGETRI() is provided in LAPACK inversion routine strangely mixes up all variables After a few changes, let' run:

PROGRAM solvelinear
implicit none
REAL(8), dimension(1000,1000)     :: A,Ainv,M,LU
REAL(8),allocatable              :: work(:)
REAL(8)                          :: wwork
INTEGER                        :: info,lwork
INTEGER,dimension(1000)        :: ipiv
INTEGER                        :: i,j
real :: start, finish

        ! put code to test here


info=0
!work=0
ipiv=0

call RANDOM_NUMBER(A)

call cpu_time(start)
!-- LU factorisation
LU = A
CALL DGETRF(1000,1000,LU,1000,ipiv,info)

!-- Inversion of matrix A using the LU
Ainv=LU
lwork=-1
CALL DGETRI(1000,Ainv,1000,Ipiv,wwork,lwork,info)
lwork =INT( wwork+0.1)
allocate(work(lwork))
CALL DGETRI(1000,Ainv,1000,Ipiv,work,lwork,info)
deallocate(work)

call cpu_time(finish)
print '("Time = ",f6.3," seconds.")',finish-start

!-- computation of A^-1 * A to check the inverse
M = matmul(Ainv,A)

print*,"M = "
do i=1,3
  do j=1,3
    print*,M(i,j)
  enddo
end do

END PROGRAM solvelinear

Once compiled by using gfortran main2.f90 -o main2 -llapack -lblas -lm -Wall, it takes 0.42s for 1000x1000 matrix and 3s for a 2000x2000 matrix.

Finally, different performances may occur if the Fortran code and python code do not link to the same Blas/Lapack libraries. To investigate this, type commands like np.__config__.show() as shown in Link ATLAS/MKL to an installed Numpy or commands reported in How to check BLAS/LAPACK linkage in NumPy and SciPy? .

To got further and make use of distributed computations, petsc discourages inverting full matrices, as it is rarely required. It is also written that MatMatSolve(A,B,X), where B and X are dense matrices can be used to do so. Furthermore, this function is provided in the python interface petsc4py as method matSolve(self, Mat B, Mat X) for the object petsc4py.PETSc.Mat. The no-maintained-anymore Elemental library is listed as implementing a direct solver for dense matrices. While the Elemental library supported a python interface, its fork Hydrogen does not support it anymore. Nevertheless, the Elemental page lists some related open source projects for Distributed dense linear algebra. ScaLapack provides the routine PDGETRI()/PZGETRI() to invert distributed dense matrices using LU decomposition. That might leave some room for a faster inversion.

Best performance method when solving 7000x7000 linear system with python

1 Answers1

Linked