I created a huge sparse matrix in csr
format, and for some reason, I need to iterate through the rows (randomly) and to dot
operations, and I found the code is much much slower than using dense array, here is the benchmark.
In [1]: a = sp.csr_matrix(np.random.rand(10000, 10000))
In [2]: b = a.todense()
In [126]: %timeit a[1357]
10000 loops, best of 3: 78.1 µs per loop
In [127]: %timeit b[1357]
The slowest run took 6.80 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.49 µs per loop
dense array row indexing is about 30x faster than csr_matrix
, am I doing this right, and how to improve it?