How to set rolling window size by size of each group?

Question

I have a data frame like below:

>df

ID    Value
---------------
1       1.0
1       2.0
1       3.0
1       4.0
2       6.0
2       7.0
2       8.0
3       2.0

I want to calculate min/max/sum/mean/var on 'value' field of last int(group size /2) records of each group instead of fix number of records.

For ID =1, apply min/max/sum/mean/var on 'value' field of last 4/2=2 records
For ID =2, apply min/max/sum/mean/var on 'value' field of last 3/2=1 records.
For ID =3, apply min/max/sum/mean/var on 'value' field of last 1 records since it only have one records in the group.

so the output should be

             Value
ID    min   max  sum  mean  var
----------------------------------
1     3.0   4.0  7.0  3.5    0.5 # the last 4/2 rows for group with ID =1
2     7.0   7.0  7.0  7.0    0.5 # the last 3/2 rows for group with ID =2
3     2.0   2.0  2.0  2.0    Nan # the last 1 rows for group with ID =3

I am thinking to use the rolling function like below:

df_group=df.groupby('ID')
           .apply(lambda x: x \
                           .sort_values(by=['ID'])
                           .rolling(window=int(x.size/2),min_periods=1)
                           .agg({'Value':['min','max','sum','mean','var']})
                           .tail(1)
                  )

but the result turns out to be as below

                Value
        min max sum    mean  var
ID                      
------------------------------------------------
1   3   1.0 4.0 10.0    2.5 1.666667
2   6   6.0 8.0 21.0    7.0 1.000000
3   7   2.0 2.0 2.0     2.0 NaN

it seems the x.size does not work at all.

Is there any way to set the rolling size based on group size?

Hi, can you share what you've tried to do as well as an expected result (dataframe or other) ? — David Leon, Feb 11 '20 at 12:48
I've updated the question with what I 've done and expected output, any hint? — digitalearth, Feb 12 '20 at 00:28
Don't know why you need to roll over the dataframe, see https://stackoverflow.com/a/60223067/3941704 for a possible solution — David Leon, Feb 18 '20 at 13:22

score 0 · Answer 1 · answered Feb 14 '20 at 09:03

A possible solution, with :

import pandas as pd
df = pd.DataFrame(dict(ID=[1,1,1,1,2,2,2,3],
                      Value=[1,2,3,4,6,7,8,2]))

print(df)
##
   ID  Value
0   1      1
1   1      2
2   1      3
3   1      4
4   2      6
5   2      7
6   2      8
7   3      2

Loop over groups as below

#Object to store the result
stats = []

#Group over ID
for ID, Values in df.groupby('ID'):
    # tail : to get last n values, with n max between 1 and group length / 2
    # describe : to get the statistics
    _stat = Values.tail(max(1,int(len(Values)/2)))['Value'].describe()
    #Add group ID to the result
    _stat.loc['ID'] = ID
    #Store the result
    stats.append(_stat)

#Create the new dataframe
pd.DataFrame(stats).set_index('ID')

Result

     count  mean       std  min   25%  50%   75%  max
ID                                                   
1.0    2.0   3.5  0.707107  3.0  3.25  3.5  3.75  4.0
2.0    1.0   8.0       NaN  8.0  8.00  8.0  8.00  8.0
3.0    1.0   2.0       NaN  2.0  2.00  2.0  2.00  2.0

Links :

How to set rolling window size by size of each group?

1 Answers1