10

I have a dict containing several pandas Dataframe (identified by keys) , any suggestion to effectively serialize (and cleanly load) it . Here is the structure (a pprint display output ). Each of dict['method_x_']['meas_x_'] is a pandas Dataframe. The goal is to save the dataframes for a further plotting with some specific plotting options.

{'method1':

{'meas1':

                          config1   config2
                   0      0.193647  0.204673
                   1      0.251833  0.284560
                   2      0.227573  0.220327,
'meas2':   
                          config1   config2
                   0      0.172787  0.147287
                   1      0.061560  0.094000
                   2      0.045133  0.034760,

'method2':

{ 'meas1':

                          congif1   config2
                   0      0.193647  0.204673
                   1      0.251833  0.284560
                   2      0.227573  0.220327,

'meas2':

                          config1   config2
                   0      0.172787  0.147287
                   1      0.061560  0.094000
                   2      0.045133  0.034760}}
Wajih
  • 905
  • 9
  • 13

2 Answers2

8

Use pickle.dump(s) and pickle.load(s). It actually works. Pandas DataFrames also have their own method df.save("filename") that you can use to serialize a single DataFrame...

sjakobi
  • 3,546
  • 1
  • 25
  • 43
2

In my particular use case, I tried to do a simple pickle.dump(all_df, open("all_df.p","wb"))

And while it loaded properly with> all_df = pickle.load(open("all_df.p","rb"))

When I restarted my Jupiter enviroment I would get a UnpicklingError: invalid load key, '\xef'.

One of the methods described here state that we can use HDF5 (pytables) to do the job. From their docs:

HDFStore is a dict-like object which reads and writes pandas

But it seems to be picky about the tablesversion that you use. I got mine to work after a pip install --upgrade tables and doing a runtime restart.

If you need a overall idea on how to use it:

#consider all_df as a list of dataframes
with pd.HDFStore('df_store.h5') as df_store:
    for i in all_df.keys():
        df_store[i] = all_df[i]

You should have a df_store.h5 file that you can convert back using the reverse process:

new_all_df = dict()
with pd.HDFStore('df_store.h5') as df_store:
    for i in df_store.keys():
        new_all_df[i] = df_store[i]
Johnny Bigoode
  • 578
  • 10
  • 31
  • Thanks for that. The h5py.org documentation is a nightmare but perhaps that's simply because hdf5 is a nightmare. – user3673 May 13 '21 at 20:47