Load a dataframe to memory python

Question

I have a large file I need to load to a dataframe. I will need to work on it for a while. Is there a way of keeping in loaded in memory, so that if my script fails, I will not need to load it again ?

Maybe you can [pickle](http://docs.python.org/2/library/pickle.html) it using [`to_pickle`](http://pandas.pydata.org/pandas-docs/stable/io.html#pickling) — jezrael, Jan 14 '16 at 08:28
And maybe [this](http://matthewrocklin.com/blog/work/2015/03/16/Fast-Serialization/) help. — jezrael, Jan 14 '16 at 09:16
Thanks ! How about other data structures ? like numpy matrices or objects ? — matlabit, Jan 20 '16 at 07:37

score 1 · Answer 1 · edited May 23 '17 at 11:59

Here's an example of how one can keep variables in memory between runs.

For persistent storage beyond RAM, I would recommend looking into HDF5. It's fast, simple, and allows for queries if necessary: (see docs).

It supports .read_hdf() and .to_hdf() similar to the _csv() methods, but is significantly faster.

A simple illustration of storage and retrieval including query (from the docs) would be:

df = DataFrame(dict(A=list(range(5)), B=list(range(5))))
df.to_hdf('store_tl.h5','table', append=True)
read_hdf('store_tl.h5', 'table', where = ['index>2'])

Load a dataframe to memory python

1 Answers1