0

I would like to create a method inside a class that gets a variable and a function as input arguments and return a new value. In below example the arbitrary function can be max, min, mean, or ...:

import pandas as pd
df = pd.DataFrame( {'col1': [1, 2], 'col2': [4, 6]})
df.max(axis=1), df.min(axis=1), df.mean(axis=1)  # sample of methods that I would like to pass

I would like to do similar through a method inside a class. My attempt so far that does not work:

class our_class():
    def __init__(self, df):
        self.df = df
    
    def arb_func(self, func):
        return self.df.func()

ob1 = our_class(df)
ob1.arb_func(max(axis=1))

Any suggestions appreciated.

PS: It is a toy problem here. My goal is to be able to get a data frame and do arbitrary number of statistical analysis on it later. I do not want to hardcode the statistical analysis and let it change later if needed.

user101464
  • 35
  • 7

2 Answers2

2

You could try this:

class our_class():
    def __init__(self, df):
        self.df = df
    
    def arb_func(self, func):
        return func(self.df)

You could then use it like this:

ob1 = our_class(df)
ob1.arb_func(lambda x: x.max(axis=1))
jjramsey
  • 1,131
  • 7
  • 17
  • Looks good and is actually a good example of Python dispatching in action. The question remains to all of us: why would the OP want to create an (otherwise) unnecessary layer that just brings additional complexity. – deponovo Jan 27 '22 at 20:54
  • @deponovo Thanks for your good comments so far. I would like (later) to be able to receive all the functions that I need to apply on my data in the form of a list and be able to apply them on my data. I was not sure where to start so my toy question was my first attempt to do it. Probably there are way better approaches that I am not familiar with. – user101464 Jan 27 '22 at 21:14
  • @jjramsey Thank you. This was a very nice solution. – user101464 Jan 28 '22 at 06:57
1

New suggestion

As long as you make sure the function you pass requires a dataframe as its first argument, the problem becomes simple as (as already noted by @jjramsey):

class our_class():
    def __init__(self, df):
        self.df = df
    
    def arb_func(self, func):
        return func(self.df)

Virtually any method of pd.DataFrame, i.e. a method having a self as first input, for instance pd.DataFrame.max source, is directly compatible with this use. In this version you would have to be passing partial functions every time you would need some additional configurations in the form of arguments and keyword arguments. In your case this is the use of axis=1. A little modification to the above implementation can account for such situations:

class our_class():
    def __init__(self, df):
        self.df = df
    
    def arb_func(self, func, *args, **kwargs):
        return func(self.df, *args, **kwargs)

Now this implementation is that generic that you can pass your own functions as well as long as the first parameter is the dataframe. For instance, you would like to count how many apples you have with your own count_apples function as:

def count_apples(df, apples_column_name):
    return df[apples_column_name].eq('apple').sum()

Now making use of it as:

df = pd.DataFrame({"fruits_in_store": ["apple", "apple", "pear", "banana", "papaya"]})
ob1.arb_func(count_apples, "fruits_in_store")  # it is possible to pass this into the `apples_column_name` as an arg
ob1.arb_func(count_apples, apples_column_name="fruits_in_store")  # or you can be explicit

Original answer

I assume the OP is trying to generate some generic coding interface for educational purposes?

Here a suggestion (which in my opinion is actually making the usage way more complex than necessary, as many other users have already noted in their questions/comments):

from functools import partial
import pandas as pd

df = pd.DataFrame({"a": [1, 2, 3]})

class our_class():
    def __init__(self, df):
        self.df = df
    
    def arb_func(self, func: str, **kwargs):
        return partial(getattr(pd.DataFrame, func), **kwargs)(df)

ob1 = our_class(df)
print(ob1.arb_func("max", axis=1))
0    1
1    2
2    3
dtype: int64


print(ob1.arb_func("max", axis=0))
a    3
dtype: int64

deponovo
  • 1,114
  • 7
  • 23
  • @ deponovo, thanks for the new method. I do not seem to be able to get it working with import pandas as with df = pd.DataFrame( {'col1': [1, 2], 'col2': [4, 6]}) and ob1 = our_class(df), ob1.arb_func("max", axis=1). – user101464 Jan 28 '22 at 17:04
  • 1
    @user101464 I guess you are now trying the `New suggestion`. For that version to work, you have to pass a reference to a function and not a function name. For instance: `ob1.arb_func(pd.DataFrame.max, axis=1)`. – deponovo Jan 29 '22 at 09:17