3

I have a sparse matrix X

<1000000x153047 sparse matrix of type '<class 'numpy.float64'>'
with 5082518 stored elements in Compressed Sparse Column format>

and I have an array

columns_to_use 

It consist of 10000 id of columns of matrix X. I want to use only these columns and drop another columns. I try to use such code:

X_new = X[:, columns_to_use]

And it works good with small X (10 000 rows), but with 100 000 rows or more I get memory error. How to get specific columns without memory error?

malugina
  • 196
  • 1
  • 6
  • Sparse column selection is done with matrix multiplication, as described in https://stackoverflow.com/questions/39500649/sparse-matrix-slicing-using-list-of-int. Such a selection will create a new sparse matrix. Have you been able to do any other calculations with this large matrix? Make a copy? – hpaulj Jan 04 '18 at 17:06

1 Answers1

1

I got such decision:

cols = []
for i in columns_to_use:
    cols.append(X[:,i])
X_new = hstack(cols)

it works fast enough and without any erorrs. And it's easy.

malugina
  • 196
  • 1
  • 6