0

I have a created a class like below which takes a pandas data frame and returns aggregate of it and sample of it. I can call each of those methods separately but I am unable to chain them like df.columns.to_list(). How can I make it work?

import pandas as pd    
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')




class MyClass:

    def __init__(self, df):
        self.df=df
    
    def return_agg(self):
        non_num=self.df.select_dtypes(exclude='number').columns.to_list()
        self.df=self.df.groupby(non_num,dropna=False).sum().reset_index()
        return self.df

    def return_sample(self):
        self.sample=self.df.sample(frac=0.1, replace=True, random_state=1)
        return self.sample
    

a = MyClass(iris)
a.return_sample() #works
a.return_agg() #works
a.return_sample().return_agg() #doesnot work

After making the change as suggested by various friends below, the method chaining works but the result is not expected.

a = MyClass(iris)
df1=a.return_agg().df
df2=a.return_sample().return_agg().df
df1
[44]:
species sepal_length    sepal_width petal_length    petal_width
0   setosa  250.3   171.4   73.1    12.3
1   versicolor  296.8   138.5   213.0   66.3
2   virginica   329.4   148.7   277.6   101.3
[45]:

df2
[45]:
species sepal_length    sepal_width petal_length    petal_width
0   setosa  250.3   171.4   73.1    12.3
1   versicolor  296.8   138.5   213.0   66.3
2   virginica   329.4   148.7   277.6   101.3

df2 should be different from df1 because it is aggregating on sample.

itthrill
  • 1,241
  • 2
  • 17
  • 36

1 Answers1

0

In your current implementation, self.sample is an instance of pandas.DataFrame and not MyClass, and since pandas Dataframe doesn't have return_sample method, its obvious that it'll return an error. If you create an instance of MyClass to store self.sample, the provided function calls should work as expected

class MyClass:

    def __init__(self, df):
        self.df = df

    def return_agg(self):
        non_num = self.df.select_dtypes(exclude='number').columns.to_list()
        self.df = self.df.groupby(non_num, dropna=False).sum().reset_index()
        return self.df

    def return_sample(self):
        # Instance of MyClass                <----
        self.sample = MyClass(self.df.sample(frac=0.1, 
                                             replace=True, 
                                             random_state=1)
                              )
        return self.sample

Talking about chaining that you've mentioned in the question, columns in df.columns.to_list() is an attribute and is an instance of another class which implementes to_list()

ThePyGuy
  • 17,779
  • 5
  • 18
  • 45