I'm trying to implement a df.name
attribute for my Dataframes. I have a lot of reasons to do this and store other metadata in my class that inherits pd.DataFrame
, but I won't get into that here... I'm using the 'name' as an example of metadata. If this doesn't work for something simple, then I'll use a different approach entirely.
Research:
DataFrame.name
won't survive pickling. The only option is an experimental df.attr['name']
that I don't trust and don't like the access implementation. I'll consider that option if this doesn't work.
Get the name of a pandas DataFrame
There's a discussion about adding the df.name
attribute on Pandas GitHub: https://github.com/pandas-dev/pandas/issues/447#issuecomment-10949838
But it's gridlocked based on disagreement about use cases and implementation difficulties. No movement in 11 years...
How to Pickle yourself: I'm trying to override the df.to_pickle()
and pd.read_pickle()
methods based on this example: How to pickle yourself?
class NamedDataFrame(pd.DataFrame):
'''
a dataframe with a name
'''
def __init__(self, data=None, index=None, columns=None, dtype=None, copy=False, name: str = None):
super().__init__(data, index, columns, dtype, copy)
self.name = name
#override the pickling methods to include the name
def to_pickle(self, path, compression='infer', protocol=4):
print("pickling myself")
with open(path, 'wb') as f:
pickle.dump(self, f, protocol)
@classmethod
def read_pickle(cls, path):
with open(path, 'rb') as f:
return pickle.load(f)
But, no worky...
>>> ndf = NamedDataFrame(data=mydf,name='mytestname')
>>> ndf.name
mytestname
>>> ndf.to_pickle(mypath)
pickling myself
>>> pndf = NamedDataFrame.read_pickle(mypath)
>>> pndf
(shows dataframe output to confirm reading from pickle worked)
>>> pndf.name
AttributeError: 'NamedDataFrame' object has no attribute 'name'
What gives? It seems like I'm missing something huge here on how pickling works, and I'd like to understand what I'm missing, and hopefully find a solution to this problem.