How to get columns from big sparse csc matrix

Question

I have a sparse matrix X

<1000000x153047 sparse matrix of type '<class 'numpy.float64'>'
with 5082518 stored elements in Compressed Sparse Column format>

and I have an array

columns_to_use

It consist of 10000 id of columns of matrix X. I want to use only these columns and drop another columns. I try to use such code:

X_new = X[:, columns_to_use]

And it works good with small X (10 000 rows), but with 100 000 rows or more I get memory error. How to get specific columns without memory error?

Sparse column selection is done with matrix multiplication, as described in https://stackoverflow.com/questions/39500649/sparse-matrix-slicing-using-list-of-int. Such a selection will create a new sparse matrix. Have you been able to do any other calculations with this large matrix? Make a copy? — hpaulj, Jan 04 '18 at 17:06

score 1 · Answer 1 · answered Jan 04 '18 at 19:34

1

I got such decision:

cols = []
for i in columns_to_use:
    cols.append(X[:,i])
X_new = hstack(cols)

it works fast enough and without any erorrs. And it's easy.

answered Jan 04 '18 at 19:34

malugina

1 Answers1