I have a data frame like below:
>df
ID Value
---------------
1 1.0
1 2.0
1 3.0
1 4.0
2 6.0
2 7.0
2 8.0
3 2.0
I want to calculate min/max/sum/mean/var
on 'value' field of last int(group size /2)
records of each group instead of fix number of records.
- For ID =1, apply
min/max/sum/mean/var
on 'value' field of last 4/2=2 records - For ID =2, apply
min/max/sum/mean/var
on 'value' field of last 3/2=1 records. - For ID =3, apply
min/max/sum/mean/var
on 'value' field of last 1 records since it only have one records in the group.
so the output should be
Value
ID min max sum mean var
----------------------------------
1 3.0 4.0 7.0 3.5 0.5 # the last 4/2 rows for group with ID =1
2 7.0 7.0 7.0 7.0 0.5 # the last 3/2 rows for group with ID =2
3 2.0 2.0 2.0 2.0 Nan # the last 1 rows for group with ID =3
I am thinking to use the rolling
function like below:
df_group=df.groupby('ID')
.apply(lambda x: x \
.sort_values(by=['ID'])
.rolling(window=int(x.size/2),min_periods=1)
.agg({'Value':['min','max','sum','mean','var']})
.tail(1)
)
but the result turns out to be as below
Value
min max sum mean var
ID
------------------------------------------------
1 3 1.0 4.0 10.0 2.5 1.666667
2 6 6.0 8.0 21.0 7.0 1.000000
3 7 2.0 2.0 2.0 2.0 NaN
it seems the x.size does not work at all.
Is there any way to set the rolling size based on group size?