sparse
uses matrix multiplication to select rows like this. I worked out the details of the extractor
matrix in another SO question, but roughly to get a (p, n) matrix from a (m, n) one, it needs to use a (p, m) matrix (with p
nonzero values).
Matrix multiplication itself is a 2 pass process. The first pass determines the size of the resulting matrix.
In contrast to dense numpy
arrays, sparse matrix slicing never returns a view.
Sparse matrix slicing using list of int
has details on the extractor matrix. I also suggest testing csr.sum(axis=1)
, since that too uses matrix multiplication.
def extractor(indices, N):
indptr=np.arange(len(indices)+1)
data=np.ones(len(indices))
shape=(len(indices),N)
return sparse.csr_matrix((data,indices,indptr), shape=shape)
So indexing every other row requires:
In [99]: M = sparse.random(100,80,.1, 'csr')
In [100]: M
Out[100]:
<100x80 sparse matrix of type '<class 'numpy.float64'>'
with 800 stored elements in Compressed Sparse Row format>
In [101]: E = extractor(np.r_[1:100:2],100)
In [102]: E
Out[102]:
<50x100 sparse matrix of type '<class 'numpy.float64'>'
with 50 stored elements in Compressed Sparse Row format>
In [103]: M1 = E*M
In [104]: M1
Out[104]:
<50x80 sparse matrix of type '<class 'numpy.float64'>'
with 407 stored elements in Compressed Sparse Row format>