How to pass multiple interrelated columns to the function on groupby and agg?

Question

I have the following pandas DataFrame df:

id  col1   col2
1   7      1.2
1   6      0.8
1   12     0.9
1   1      1.1
2   3      2.0
2   6      1.8
3   10     0.7
3   11     0.9
3   12     1.2

Here is the code to create this df:

import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,2,2,3,3,3], 
                   'col1': [7,6,12,1,3,6,10,11,12],
                   'col2': [1.2,0.8,0.9,1.1,2.0,1.8,0.7,0.9,1.2]})

I need to group by id and apply the function myfunc to each group. The problem is that myfunc requires several interrelated columns as an input. The final goal is to create a new column new_col for each id.

How can I do it?

This is my current code:

def myfunc(df, col1, col2):

    df1 = col1
    df2 = df[df[col2] < 1][[col1]]
    var1 = df1.iloc[0]
    var2 = df2.iloc[0][0]

    result = var2 - var1

    return result


df["new_col"] = df.groupby("id").agg(myfunc(...??))

[How to make a good and reproducible Pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Nils Werner, Jul 18 '19 at 09:34
Maybe are you looking for [`apply`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) (instead of `agg`) ? — Alexandre B., Jul 18 '19 at 09:38
@AlexandreB.: The function is supposed to take the first value of `col1` as `var1` and the first value of `col1` by the condition on `col2` as `var2`. Then I should calculate the difference between `var1` and `var2`. See the updates. — Fluxy, Jul 18 '19 at 09:40

score 0 · Accepted Answer · answered Jul 18 '19 at 09:41

0

In groupby-apply, my_func() is passed the entire group, with all columns. You can simply select the columns from that group:

def myfunc(g):
    var1 = g['col1'].iloc[0]
    var2 = g.loc[g['col2'] > 1, 'col1'].iloc[0]

    return var1 / var2

df['new_col'] = df.groupby("id").apply(myfunc)

answered Jul 18 '19 at 09:41

Nils Werner

34,832
7
76
98

Can I do something like this?: `df[['new_col1','new_col2']] = df.groupby("id").apply(myfunc1,myfunc2)` – Fluxy Jul 18 '19 at 09:45
[Yes](https://stackoverflow.com/questions/10751127/returning-multiple-values-from-pandas-apply-on-a-dataframe), but please stay on topic here. – Nils Werner Jul 18 '19 at 09:48

How to pass multiple interrelated columns to the function on groupby and agg?

1 Answers1