I have a function that I wish to apply to a subsets of a pandas DataFrame, so that the function is calculated on all rows (until current row) from the same group - i.e. using a groupby
and then expanding
.
For example, this dataframe:
df = pd.DataFrame.from_dict(
{
'group': ['A','A','A','B','B','B'],
'time': [1,2,3,1,2,3],
'x1': [10,40,30,100,200,300],
'x2': [1,0,1,2,0,3]
}).sort_values('time')
i.e.
group time x1 x2
0 A 1 10 1
3 B 1 100 2
1 A 2 40 2
4 B 2 200 0
2 A 3 30 1
5 B 3 300 3
and this function, for example:
def foo(_df):
return _df['x1'].max() * _df['x2'].iloc[-1]
[Edited for clarity following feedback from jezrael: my actual function is more complicated, and cannot be easily broken down into components for this task. this simple function is just for an MCVE.]
I want to do something like:
df['foo_result'] = df.groupby('group').expanding().apply(foo, raw=False)
To obtain this result:
group time x1 x2 foo_result
0 A 1 10 1 10
3 B 1 100 2 200
1 A 2 40 2 80
4 B 2 200 0 0
2 A 3 30 1 40
5 B 3 300 3 900
Problem is, running df.groupby('group').expanding().apply(foo, raw=False)
results in KeyError: 'x1'
.
Is there a correct way to run this, or is it not possible to do so in pandas
without breaking down my function into components?