1

I created a huge sparse matrix in csr format, and for some reason, I need to iterate through the rows (randomly) and to dot operations, and I found the code is much much slower than using dense array, here is the benchmark.

In [1]: a = sp.csr_matrix(np.random.rand(10000, 10000))
In [2]: b = a.todense()

In [126]: %timeit a[1357]
10000 loops, best of 3: 78.1 µs per loop

In [127]: %timeit b[1357]
The slowest run took 6.80 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.49 µs per loop

dense array row indexing is about 30x faster than csr_matrix, am I doing this right, and how to improve it?

avocado
  • 2,615
  • 3
  • 24
  • 43
  • Possible duplicate of [why is row indexing of scipy csr matrices slower compared to numpy arrays](https://stackoverflow.com/questions/34010334/why-is-row-indexing-of-scipy-csr-matrices-slower-compared-to-numpy-arrays) – sascha Aug 20 '17 at 14:50
  • @sascha yes, could be a dup,and I indeed read that one, but that answer just explained why it's slow, I didn't see (anything I missed?) any solution to improve the speed. – avocado Aug 20 '17 at 14:55
  • In regards to this question: it's too broad (depends very much on the use-case) or do you think it's slow because it's wanted? :-) – sascha Aug 20 '17 at 14:57
  • 1
    The dense selection returns a `view`. The sparse selection creates a new sparse matrix with new attributes. But why do a dot product on rows individually? Try to work with whole matrix without iteration. – hpaulj Aug 20 '17 at 16:11
  • @hpaulj, actually doing a mini-batch iterative algo, so – avocado Aug 21 '17 at 03:42
  • When selecting multiple rows, `sparse` uses matrix multiplication - with an appropriate 'extractor' matrix. https://stackoverflow.com/questions/39500649/sparse-matrix-slicing-using-list-of-int – hpaulj Aug 21 '17 at 04:12
  • @hpaulj, however, if I access the `.data` via `.indptr`, it's much much faster, so I'm thinking maybe get data from sparse array that way, and do a elem-wise product and sum it all, instead of doing sparse matrix row indexing and dot product. – avocado Aug 21 '17 at 05:27
  • @avocado where you able to solve the problem? Care to please share your solution? I am coding an Stochastic Gradient Descent method and I am going through the rows randomly and it would be helpful to know what you did in the end. Thanks – melqkiades Oct 08 '19 at 10:36

0 Answers0