1

I have created a class based on a pandas.DataFrame object which initializes with a csv file as shown here:

import pandas as pd

class CustomDataFrame(pd.DataFrame):

    def  __init__(self, input_file):
        df = pd.read_csv(input_file)
        super().__init__(df)
    #...

This way, I have a CustomDataFrame type that has additional specific methods to operate on itself. The problem I have with this setup is that when I take a slice of the object, it returns a pandas.DataFrame object instead of keeping the same type. In other words:

> blip = mypackage.CustomDataFrame('test.csv')

> type(blip)
mypackage.CustomDataFrame

> type(blip[1:3])
pandas.core.frame.DataFrame

Is there a simple way to correct my custom class such that it can operate on itself in all the ways that a pandas.DataFrame can, while returning this custom class each time rather than just the built-in version of the DataFrame?

teepee
  • 2,620
  • 2
  • 22
  • 47
  • 1
    I have seen it said by one of the top 3 Pandas answerers that subclassing a DataFrame is a _bad idea_. – roganjosh Dec 31 '18 at 19:05
  • Oh really? That throws a wrench in my plans. Could you please link me to some of those answers? And do you know what a better alternative would be? Thanks. – teepee Dec 31 '18 at 19:30
  • 1
    I don't recall exactly what the other issue was, but I spend plenty of time looking at SO questions and this is only the second that I recall that was using subclassing tbh. It might be better to flesh out your plan with one of the custom methods that makes you think you need to subclass in the first place.Why can't you just have an object that applies methods to its own DF attribute? – roganjosh Dec 31 '18 at 19:55
  • 1
    https://stackoverflow.com/q/22155951/6361531 – Scott Boston Jan 01 '19 at 06:04

0 Answers0