I'm trying to create a list of SparseSeries from a sparse numpy matrix. Creating the lil_matrix is fast and does not consume a lot of memory (in reality my dimension are more in the order of millions, i.e. 15 million samples and 4 million features). I have read a previous topic on this. But that solution as well seems to eat up all my memory, freezing my computer. At the surface it looks like the pandas SparseSeries is not really sparse, or am I doing something wrong? The ultimate goal is to create a SparseDataFrame from this (just like in the other topic I referred to).
from scipy.sparse import lil_matrix, csr_matrix
from numpy import random
import pandas as pd
nsamples = 10**5
nfeatures = 10**4
rm = lil_matrix((nsamples,nfeatures))
for i in xrange(nsamples):
index = random.randint(0,nfeatures,size=4)
rm[i,index] = 1
l=[]
for i in xrange(nsamples):
l.append(pd.Series(rm[i,:].toarray().ravel()).to_sparse(fill_value=0))