4

I have a situation in which I need to extract a single row from a sparse matrix and take its dot product with a dense row. Using scipy's csr_matrix, this appears to be significantly slower than using numpy's dense array multiplication. This is surprising to me because I expected that sparse dot product would involve significantly fewer operations. Here is an example:

import timeit as ti

sparse_setup = 'import numpy as np; import scipy.sparse as si;' + \
               'u = si.eye(10000).tocsr()[10];' + \
               'v = np.random.randint(100, size=10000)'

dense_setup  = 'import numpy as np; u = np.eye(10000)[10];' + \
               'v = np.random.randint(100, size=10000)'

ti.timeit('u.dot(v)', setup=sparse_setup, number=100000)
2.788649031019304

ti.timeit('u.dot(v)', setup=dense_setup, number=100000)
2.179030169005273

For matrix-vector multiplication, the sparse representation wins hands down, but not in this case. I tried with csc_matrix, but performance is even worse:

>>> sparse_setup = 'import numpy as np; import scipy.sparse as si;' + \
...                'u = si.eye(10000).tocsc()[10];' + \
...                'v = np.random.randint(100, size=10000)'
>>> ti.timeit('u.dot(v)', setup=sparse_setup, number=100000)
7.0045155879925005

Why does numpy beat scipy.sparse in this case? Is there a matrix format that's faster for these kind of computations?

castle-bravo
  • 1,389
  • 15
  • 34
  • Sparse matrices save memory in expense of more complexity when it comes to computations. – MaxNoe Nov 20 '15 at 02:08
  • 2
    How are you counting 'operations'? Just multiplications and additions? Or are you considering indexing, iteration, etc. On modern processors the multiplying 2 numbers is not an expensive operation. Also `dot` gets deligated to fast numeric libraries. The sparse multiplication is also compiled but not with the same optimized libraries. – hpaulj Nov 20 '15 at 02:08
  • Just to be clear, you are testing a very sparse vector (1 nonzero out of 10000) times a dense vector of the same size. It ends up using, I think, `sparse._sparsetools.csr_matvec`, a compiled function. I'd have to look at the scipy github to dig further. – hpaulj Nov 20 '15 at 02:35
  • @hpaulj, I think of a sparse matrix as a collection of row-column-value tuples. Dot product of a k-element sparse row vector with an n-element dense vector should only require O(k) operations (with higher constant factors than dense), while dense multiplication should take O(n). I will take your good advice and read the source. – castle-bravo Nov 20 '15 at 15:55
  • 1
    When I vary the number of nonzero values in `u` (and the dense equivalent), there is no variation in the timings. `(u.data * v[u.indices]).sum()` is closer to what you imagine happening. It is somewhat faster, but its time is still not O(k). – hpaulj Nov 20 '15 at 18:17
  • Yes, this is exactly what I imagined happening. Prompted by your comment, I tried using the DOK format instead of CSR. This results in s 3.5* speedup and is faster than dense dot product. – castle-bravo Nov 20 '15 at 20:15

1 Answers1

1

The CSR/CSC vector product call has a few microsecond overhead per call, from executing a tiny bit of Python code, and dealing with arguments in compiled code (scipy.sparse._sparsetools.csr_matvec).

On modern processors, computing vector dot products is very fast, so the overheads dominate the computing time in this case. Matrix-vector products itself is more expensive, and here a similar overhead is not visible.

Why are then the overheads smaller for Numpy? This is mainly just due to better optimization of the code; the performance of csr_matrix can likely be improved here.

pv.
  • 33,875
  • 8
  • 55
  • 49