Parallel Scipy COO Matrix Computations

Question

I am trying to calculate sparse matrix calculations using scipy for an algorithm that require intensive dependent computations(PageRank) on very large RDF datasets. I want to use multiple cores for the scipy calculation within the following code

F = sparse.coo_matrix((y['data'],(y['row'],y['col'])),shape=y['shape'])
W = sparse.coo_matrix((y['data'],(y['row'],y['col'])),shape=y['shape'])
P = sparse.bmat([[None, W], [F, None]])
previous = np.ones(n)/n
ones = np.ones(n)/n
while error > epsilon:
    tmp = np.array(previous)
    previous = damping*P.T.dot(previous) + (1-damping)*ones
    error = np.linalg.norm(tmp - previous)
    if(printerror):
        print(error)

I have searched every possible answer I could find and I tried integrating the mkl(anaconda build) within the code but the performance on multiple cores does not seem to scale up. I have come to an understanding that the scipy call csr.h does not make use of BLAS call, I am wondering whether I need to make changes and replace the call to csr_matvec in from scipy/sparsetools with an appropriate Sparse BLAS call since MKL has those and then link scipy to mkl. Am I understanding something wrong or missing something. I would really appreciate some help in the matter. One similar question is here Thanks!!

(1) Adding support for this is probably a lot of pain. (2) I'm not sure if BLAS helps (alignments). OpenMP can, but it seems [scipy-people are scared about that dependency](https://github.com/scipy/scipy/issues/1196). (3) This code is incomplete (previous? W ununsed) and maybe you should work on the algorithmics rather than parallizing. Maybe a [LinearOperator](https://docs.scipy.org/doc/scipy-1.0.0/reference/generated/scipy.sparse.linalg.LinearOperator.html#scipy.sparse.linalg.LinearOperator) can come into play. (4) And the COO-mat thing is misleading. It will never be part of mat-vec mult. — sascha, Dec 20 '17 at 15:04
I just realized there is indeed something like sparse BLAS in MKL. Cool. But the question is: can this be working with scipy's sparse-matrices and how much changes to the code would be needed. — sascha, Dec 20 '17 at 15:07
@sascha Thank you for pointing out the mistakes in the code. I have changed them. I cannot change the core algorithm as the objective is to check the scalability of the page ranking algorithm implementation on multiple cores. I have had a look in sparse BLAS in MKL but it does not work with scipy's sparse matrix (from running the results on multiple cores time only increases with increase in cores) — Kunal Jha, Dec 20 '17 at 15:42
Look at this attempt use `mkl` https://stackoverflow.com/q/37536106/901925 — hpaulj, Dec 20 '17 at 16:41

Parallel Scipy COO Matrix Computations

0 Answers0