Why does it take so long to create a SparseDataFrame (Python pandas)?

Asked Dec 28 '16 at 21:21

Active Dec 28 '16 at 21:21

Viewed 520 times

Given the following code (executed in a Jupyter notebook):

In [1]: import pandas as pd
        %time df=pd.SparseDataFrame(index=range(0,1000), columns=range(0,1000));

CPU times: user 3.89 s, sys: 30.3 ms, total: 3.92 s
Wall time: 3.92 s

Why does it take so long to create a sparse data frame?

Note that it seems to be irrelevant if I increse the dimension for the rows. But when I increase the number of columns from 1000 to say 10000, the code seems to take forever and I always had to abort it.

Compare this with scipy's sparse matrix:

In [2]: from scipy.sparse import lil_matrix
        %time m=lil_matrix((1000, 1000))

CPU times: user 1.09 ms, sys: 122 µs, total: 1.21 ms
Wall time: 1.18 ms

asked Dec 28 '16 at 21:21

dlorch

That `lil_matrix` is just an empty one. It has a shape, but no values. – hpaulj Dec 28 '16 at 21:26
I've participated in SO questions about creating a sparse matrix from a sparse data frame. The required code might give you a sense of how involved it is. I believe a sparse data frame consists of one sparse data series per column. Evidently the columns are converted individually. – hpaulj Dec 28 '16 at 21:29
http://stackoverflow.com/questions/31084942/pandas-sparse-dataframe-to-sparse-matrix-without-generating-a-dense-matrix-in-m – hpaulj Dec 28 '16 at 22:12
1

https://github.com/pandas-dev/pandas/issues/16773 – jamesj629 Sep 24 '18 at 17:32

Why does it take so long to create a SparseDataFrame (Python pandas)?

0 Answers0

Linked