1

I have a large file I need to load to a dataframe. I will need to work on it for a while. Is there a way of keeping in loaded in memory, so that if my script fails, I will not need to load it again ?

matlabit
  • 838
  • 2
  • 13
  • 31

1 Answers1

1

Here's an example of how one can keep variables in memory between runs.

For persistent storage beyond RAM, I would recommend looking into HDF5. It's fast, simple, and allows for queries if necessary: (see docs).

It supports .read_hdf() and .to_hdf() similar to the _csv() methods, but is significantly faster.

A simple illustration of storage and retrieval including query (from the docs) would be:

df = DataFrame(dict(A=list(range(5)), B=list(range(5))))
df.to_hdf('store_tl.h5','table', append=True)
read_hdf('store_tl.h5', 'table', where = ['index>2'])
Community
  • 1
  • 1
Stefan
  • 41,759
  • 13
  • 76
  • 81