Sparse matrix slicing memory error

Question

I have a sparse matrix csr:

<681881x58216 sparse matrix of type '<class 'numpy.int64'>'
    with 2867209 stored elements in Compressed Sparse Row format>

And i want to create a new sparce matrix as a slice of csr: csr_2 = csr[1::2,:].

Problem: When i have csr matrix only, my server's RAM is busy with 40 GB. When i run the csr_2 = csr[1::2,:], my server's RAM is being dumped completly for 128GB and it falls with "Memory error".

Your matrix itself in your example is just 22MB (values) + some aux-stuff, probably <80MB of memory. So are you sure, that's the source of your problem (something else on your server is probably using 39GB of memory)? (and slicing sparse-matrices will induce a copy by the way) — sascha, Sep 04 '17 at 10:36
(1) This slice take each element after other, starting from the second element(odd elements). (2) Server has lots of docker fcontainers and other maintaining processes running all together taking 40GB — Ladenkov Vladislav, Sep 04 '17 at 10:38

hpaulj · Accepted Answer · 2017-09-04T16:26:17.513

sparse uses matrix multiplication to select rows like this. I worked out the details of the extractor matrix in another SO question, but roughly to get a (p, n) matrix from a (m, n) one, it needs to use a (p, m) matrix (with p nonzero values).

Matrix multiplication itself is a 2 pass process. The first pass determines the size of the resulting matrix.

In contrast to dense numpy arrays, sparse matrix slicing never returns a view.

Sparse matrix slicing using list of int

has details on the extractor matrix. I also suggest testing csr.sum(axis=1), since that too uses matrix multiplication.

def extractor(indices, N):
   indptr=np.arange(len(indices)+1)
   data=np.ones(len(indices))
   shape=(len(indices),N)
   return sparse.csr_matrix((data,indices,indptr), shape=shape)

So indexing every other row requires:

In [99]: M = sparse.random(100,80,.1, 'csr')
In [100]: M
Out[100]: 
<100x80 sparse matrix of type '<class 'numpy.float64'>'
    with 800 stored elements in Compressed Sparse Row format>
In [101]: E = extractor(np.r_[1:100:2],100)
In [102]: E
Out[102]: 
<50x100 sparse matrix of type '<class 'numpy.float64'>'
    with 50 stored elements in Compressed Sparse Row format>
In [103]: M1 = E*M
In [104]: M1
Out[104]: 
<50x80 sparse matrix of type '<class 'numpy.float64'>'
    with 407 stored elements in Compressed Sparse Row format>

So, the solution you propose is to use the extractor function? — Ladenkov Vladislav, Sep 05 '17 at 08:17
No, I'm just suggesting a reason why you could be getting the memory error. But without your data, ram, etc I can't prove it. — hpaulj, Sep 05 '17 at 12:34

Sparse matrix slicing memory error

1 Answers1

Linked