0

Say I have a function that performs some calculations on a certain dataframe where the index is the variables to be considered and the columns are the years. For example

df:

       1980    1981    1982 .....
var1
var2
var3
.
.
.   

def fun(col_df):
    var_new=var1+var2/var3
    var_new+=df.iloc[:,df.columns.get_loc(col_df+1)].iloc['var_new']

Now imagine I have a dataframe, frame in which observations are identified by year and and ID variable. Such identifiers are disposed as columns.

frame:

        date  ID  var1 var2 var3...
0       1980  1   
1       1980  2
2       1981  1
3       1981  2
4       1982  1
5       1982  2

.
.
.   

I want to make the function fun compatible with the groupby() method in pandas. In particular, my idea is to run

frame.groupby('ID').transform('fun')

after I have transformed frame into the same form as df so that fun can be applied with no problems. How can I do that? Is it better to rewrite the function fun so that it can be applied row by row instead, keeping in mind that I will have to deal with a dataframe of the form of ````frame``` which has 2 types of identifers (instead of 1) but expressed in the columns?

user9875321__
  • 195
  • 12

1 Answers1

0

pandas groupby transform custom function covers the outline of what you have to do.

You can call a function like this

def f(x, col):
    return df.loc[x.index, col]*x

df['g'] = df.groupby('b')['c'].transform(f, col='d')

print(df)

it calls the external data and gets pass parameters through the transform.

Paul Brennan
  • 2,638
  • 4
  • 19
  • 26