Here is the code:
# input:
# A : a large csr matrix (365 million rows and 1.3 billion entries), 32 bit float datatype
# get the two largest eigenvalues of A and the corresponding eigenvectors
from scipy.sparse.linalg import eigsh
(w,V) = eigsh(A,k=2,tol=10e-2,ncv=5)
As far as I can tell, there is not a lot of room to mess up here, but what I am observing is that my machine initially has plenty of memory (90G including swap), but the memory usage of eigsh slowly creeps up during the run until I run out of memory. Is there something obvious I am missing here?
What I have tried:
--Looking through the source. It is a lot, but as far as I could see, there is no memory allocated by the python code between iterations. I am not as good at Fortran, but it would be unexpected if ARPACK itself or the calling routine allocated memory.
--Tried an equivalent thing in Octave (MATLAB clone), with similar effects, although less obvious since the datatype is necessarily double precision and thus it is more constrained from the start. So perhaps it could be something with ARPACK itself?
--Googled a bunch. It looks like Scipy does (did?) use a circular reference somewhere that has caused others grief when calling eigsh multiple times, but I am calling it once, so maybe this is not the same issue.
Any help would be very greatly appreciated.