I have the following pandas DataFrame df
:
id col1 col2
1 7 1.2
1 6 0.8
1 12 0.9
1 1 1.1
2 3 2.0
2 6 1.8
3 10 0.7
3 11 0.9
3 12 1.2
Here is the code to create this df
:
import pandas as pd
df = pd.DataFrame({'id': [1,1,1,1,2,2,3,3,3],
'col1': [7,6,12,1,3,6,10,11,12],
'col2': [1.2,0.8,0.9,1.1,2.0,1.8,0.7,0.9,1.2]})
I need to group by id
and apply the function myfunc
to each group. The problem is that myfunc
requires several interrelated columns as an input. The final goal is to create a new column new_col
for each id
.
How can I do it?
This is my current code:
def myfunc(df, col1, col2):
df1 = col1
df2 = df[df[col2] < 1][[col1]]
var1 = df1.iloc[0]
var2 = df2.iloc[0][0]
result = var2 - var1
return result
df["new_col"] = df.groupby("id").agg(myfunc(...??))