4

I have defined a subclass of a pandas Dataframe. The subclass is basically identical to a dataframe but with additional methods to accomplish specialized tasks.

One of the most convenient properties of the pandas dataframe is that is supports method chaining; that is, dataframe methods return instances of the dataframe class.

I want to be able to use these methods, but when I call them from the child class, I get an instance of the parent.

import pandas as pd

class MySpecialDF(pd.DataFrame):
    def sqrt(self, colname):
        return self[colname]**2.0

df = MySpecialDF({'a':[1,2,3], 'b':[4,5,6]})
df.sqrt('a') # all good!

df = df.drop('b', axis=1) # returns a regular DF
df.sqrt('a') # AttributeError: 'DataFrame' object has no attribute 'sqrt'

How can I set things up so that these methods return instances of the subclass?

I could manually override individual methods like this:

class MySpecialDF(pd.DataFrame):
    def sqrt(self, colname):
        return self[colname]**2.0

    def drop(self, *args, **kwargs):
        return MySpecialDF(super(MySpecialDF, self).drop(*args, **kwargs))

But Dataframes have a lot of those and I don't want to do this manually for each one.

I thought there might be a way to apply some decorator wrapping each parent method, but I am not sure how to do this or if it is the right approach.

This problem is general to all cases where a subclass inherits methods that return instances of the parent.

Does anyone know how to fix this issue?

Nolan Conaway
  • 2,639
  • 1
  • 26
  • 42
  • 1
    I don't have much experience with this but it looks like piRSquared covered this in [this answer](https://stackoverflow.com/questions/47466255/subclassing-a-pandas-dataframe-updates). – ayhan Feb 27 '18 at 17:20

1 Answers1

4

Thanks to @ayhan for pointing me in the right direction. I followed the comment to this documentation, which shows how subclassing can be accomplished within pandas specifically. The fix is:

class MySpecialDF(pd.DataFrame):

    @property
    def _constructor(self):
        return MySpecialDF

    def sqrt(self, colname):
        return self[colname]**2.0

I do not know if this solves the general problem where a subclass inherits methods that return instances of the parent. However, I am not certain there can be a general solution, as the returned instances could be constructed arbitrarily.

Nolan Conaway
  • 2,639
  • 1
  • 26
  • 42