0

I am trying to store some extra information with DataFrames directly in the same DataFrame, such as some parameters describing the data stored.

I added this information just as extra attributes to the DataFrame:

df.data_origin = 'my_origin'
print(df.data_origin)

But when it is saved and loaded, those extra attributes are lost:

df.to_pickle('pickle_test.pkl')
df2 = pd.read_pickle('pickle_test.pkl')
print(len(df2))
print(df2.definition)
...
465387
>>> AttributeError: 'DataFrame' object has no attribute 'definition'

The workaround I have found is to save the dict of the DataFrame and then assign it to the dict of an empty DataFrame:

with open('modified_dataframe.pkl', "wb") as pkl_out:
    pickle.dump(df.__dict__, pkl_out)
df2 = pd.DataFrame()
with open('modified_dataframe.pkl', "rb") as pkl_in:
    df2.__dict__ = pickle.load(pkl_in)

print(len(df2))
print(df2.data_origin)
...
465387
my_origin

It seems to work, but:

  • Is there a better way to do it?
  • Am I losing information? (apparently, all the data is there)
  • Here a different solution is discussed, but I would like to know if the approach of saving the dict of a class is valid to hold its entire information.

EDIT: Ok, I found the big drawback. This works fine to save single DataFrames in isolated files, but will not work if I have dictionaries, lists or similar with DataFrames in them.

rpicatoste
  • 479
  • 2
  • 16
  • 2
    Related: [Saving in a file an array or DataFrame together with other information](https://stackoverflow.com/questions/49740190/saving-in-a-file-an-array-or-dataframe-together-with-other-information) – jpp Oct 19 '18 at 12:01

1 Answers1

0

I suggest that you can get your things done by making a new child class for pandas.DataFrame, make a new class inherit things from pandas.DataFrame class, and add your wanted attributes there. This may seem a bit spooky, but you can play around with it safely when you using in different places. Other stuff might be useful for specific cases though.

null
  • 1,944
  • 1
  • 14
  • 24
  • Do you mean like this? (it did not work so far) https://gist.github.com/rpicatoste/0d97488bcc247556ef1ccb9ccae497eb – rpicatoste Oct 19 '18 at 09:17