Apply expanding function on dataframe

Question

I have a function that I wish to apply to a subsets of a pandas DataFrame, so that the function is calculated on all rows (until current row) from the same group - i.e. using a groupby and then expanding.

For example, this dataframe:

df = pd.DataFrame.from_dict(
    {
        'group': ['A','A','A','B','B','B'],
        'time': [1,2,3,1,2,3],
        'x1': [10,40,30,100,200,300],
        'x2': [1,0,1,2,0,3]
                  }).sort_values('time')

i.e.

    group   time    x1      x2
0   A       1       10      1
3   B       1       100     2
1   A       2       40      2
4   B       2       200     0
2   A       3       30      1
5   B       3       300     3

and this function, for example:

def foo(_df):
    return _df['x1'].max() * _df['x2'].iloc[-1]

[Edited for clarity following feedback from jezrael: my actual function is more complicated, and cannot be easily broken down into components for this task. this simple function is just for an MCVE.]

I want to do something like: df['foo_result'] = df.groupby('group').expanding().apply(foo, raw=False)

To obtain this result:

    group   time    x1  x2  foo_result
0   A       1       10  1   10
3   B       1       100 2   200
1   A       2       40  2   80
4   B       2       200 0   0
2   A       3       30  1   40
5   B       3       300 3   900

Problem is, running df.groupby('group').expanding().apply(foo, raw=False) results in KeyError: 'x1'.

Is there a correct way to run this, or is it not possible to do so in pandas without breaking down my function into components?

What do you mean by ```_df['x2'].iloc[-1]```? Previous row value of ```x2```? It doesn't seem like that from your expected output (it seems like you are taking current row there...) — Grzegorz Skibinski, Jan 19 '20 at 12:12

Itamar Mushkin · Answer 1 · 2022-03-07T09:02:47.553

Applying a dataframe function on an expanding window is apparently not possible (at least not for pandas version 0.23.0; EDITED - and also not 1.3.0), as one can see by plugging a print statement into the function.

Running df.groupby('group').expanding().apply(lambda x: bool(print(x)) , raw=False) on the given DataFrame (where the bool around the print is just to get a valid return value) returns:

0    1.0
dtype: float64
0    1.0
1    2.0
dtype: float64
0    1.0
1    2.0
2    3.0
dtype: float64
0    10.0
dtype: float64
0    10.0
1    40.0
dtype: float64
0    10.0
1    40.0
2    30.0
dtype: float64

(and so on - and also returns a dataframe with '0.0' in each cell, of course).

This shows that the expanding window works on a column-by-column basis (we see that first the expanding time series is printed, then x1, and so on), and does not really work on a dataframe - so a dataframe function can't be applied to it.

So, to get the obtained functionality, one would have to put the expanding inside the dataframe function, like in the accepted answer.

score 1 · Accepted Answer · edited Jun 09 '20 at 07:45

1

An possible solution is to make the expanding part of the function, and use GroupBy.apply:

def foo1(_df):
    return _df['x1'].expanding().max() * _df['x2'].expanding().apply(lambda x: x[-1], raw=True)

df['foo_result'] = df.groupby('group').apply(foo1).reset_index(level=0, drop=True)
print (df)
  group  time   x1  x2  foo_result
0     A     1   10   1        10.0
3     B     1  100   2       200.0
1     A     2   40   2        80.0
4     B     2  200   0         0.0
2     A     3   30   1        40.0
5     B     3  300   3       900.0

This is not a direct solution to the problem of applying a dataframe function to an expanding dataframe, but it achieves the same functionality.

edited Jun 09 '20 at 07:45

Itamar Mushkin

2,803
2
16
32

answered Jan 19 '20 at 12:12

jezrael

822,522
95
1,334
1,252

1

Instead of the right side of multiplication- you can do: ```s = g['x1'].expanding().max() // df['foo_result'] = s.reset_index(level=0, drop=True)*df['x2']``` – Grzegorz Skibinski Jan 19 '20 at 13:05
1

Thank you for your help, but this function was just something I made up for a minimal, reproducible example; Breaking down my actual function to its components this way is not what I need – Itamar Mushkin Jan 19 '20 at 14:25
1

@ItamarMushkin hmmm, I try answer for `Problem is, functions on .expanding() don't work on entire dataframe, only per column... So, what can I do instead?` – jezrael Jan 19 '20 at 14:28
1

I see... then my question was not clear enough. I've edited it following your feedback. – Itamar Mushkin Jan 19 '20 at 14:35

Apply expanding function on dataframe

2 Answers2

Linked