Scipy sparse matrix multiplication much slower than numpy array

Question

I have constructed the following case to test one-dimensional sparse matrix multiplication vs numpy arrays.

from scipy.sparse import csc_matrix
sp = csc_matrix((1, 36710))
sp[0,4162] = 0.2335
sp[0,21274] = 0.1367
sp[0,27322] = 0.261
sp[0,27451] = 0.9266

%timeit sp.dot(sp.T)
arr = sp.toarray()[0]
%timeit arr.dot(arr)

The result is as follows:

267 µs ± 6.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
9.9 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Also they are both slower than a plain dict storing entries and a for-loop for the multiplication (~1µs).

The result is the same after trying different type of sparse matrix, including csr/coo. Why is sparse matrix multiplication ~30 times slower than numpy dense array multiplication? Is it because the matrix is too sparse?

There is an even faster solution: `sp.data.dot(sp.data)`, `3.3ns per loop` — Nils Werner, Sep 08 '17 at 09:06
@NilsWerner I probably need to change the example a bit, but I need to apply multiplication to different matrices, in which case your solution will give a wrong answer — Guoxing Li, Sep 08 '17 at 09:21
I suppose there is not much to be gained by using a sparse vector. The situation really changes if you use sparse matrices (nxm, n,m>1). — laolux, Sep 08 '17 at 10:33
Try matrices with 1% and 10% sparsity. This case might be too sparse with sparse matrix setup dominating actual calculations. — hpaulj, Sep 08 '17 at 11:37

score 2 · Answer 1 · answered Jul 30 '20 at 05:38

2

In hpaulj's answer, M*M is not a matrix multiplication -- it's just an element-wise multiplication. This is why M*M is much faster than matrix multiplication. So in all case matrix multiplication is much slower for csr matrix.

answered Jul 30 '20 at 05:38

Hojin Yang

21
2

1

`M = sparse.random(250,500,format='csr')` & `M * M` gives `ValueError: dimension mismatch`. `*` is element-wise for numpy arrays and dot product for sparse arrays – CJR Jul 30 '20 at 14:57

score 1 · Accepted Answer · answered Sep 09 '17 at 02:58

Your 'vector' calc with a random sparse matrix, with the default sparsity 0.01.

In [523]: M = sparse.random(1,50000,format='csr')
In [524]: timeit M*M.T
479 µs ± 289 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [525]: A = M.A
In [526]: timeit np.dot(A,A.T)
40.1 µs ± 21.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

So sparse is 10x slower. (A*A).sum() times as 130 µs.

In [531]: M
Out[531]: 
<1x50000 sparse matrix of type '<class 'numpy.float64'>'
    with 500 stored elements in Compressed Sparse Row format>

But making a square matrix (with 5x nonzero terms):

In [537]: M = sparse.random(500,500,format='csr')
In [538]: M
Out[538]: 
<500x500 sparse matrix of type '<class 'numpy.float64'>'
    with 2500 stored elements in Compressed Sparse Row format>
In [539]: A=M.A
In [540]: timeit M*M
416 µs ± 4.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [541]: timeit A@A
13.4 ms ± 81.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Now sparse has a substantial speed advantage.

The calculation methods are so different that it is hard to identify anyone thing that accounts for the time differences.

Is sparse matrix-vector multiplication faster in Matlab than in Python?

Directly use Intel mkl library on Scipy sparse matrix to calculate A dot A.T with less memory

Why is vector dot product slower with scipy's sparse csr_matrix than numpy's dense array?

Scipy sparse matrix multiplication much slower than numpy array

2 Answers2

Linked