I have defined a subclass of a pandas Dataframe. The subclass is basically identical to a dataframe but with additional methods to accomplish specialized tasks.
One of the most convenient properties of the pandas dataframe is that is supports method chaining; that is, dataframe methods return instances of the dataframe class.
I want to be able to use these methods, but when I call them from the child class, I get an instance of the parent.
import pandas as pd
class MySpecialDF(pd.DataFrame):
def sqrt(self, colname):
return self[colname]**2.0
df = MySpecialDF({'a':[1,2,3], 'b':[4,5,6]})
df.sqrt('a') # all good!
df = df.drop('b', axis=1) # returns a regular DF
df.sqrt('a') # AttributeError: 'DataFrame' object has no attribute 'sqrt'
How can I set things up so that these methods return instances of the subclass?
I could manually override individual methods like this:
class MySpecialDF(pd.DataFrame):
def sqrt(self, colname):
return self[colname]**2.0
def drop(self, *args, **kwargs):
return MySpecialDF(super(MySpecialDF, self).drop(*args, **kwargs))
But Dataframes have a lot of those and I don't want to do this manually for each one.
I thought there might be a way to apply some decorator wrapping each parent method, but I am not sure how to do this or if it is the right approach.
This problem is general to all cases where a subclass inherits methods that return instances of the parent.
Does anyone know how to fix this issue?