5

How can get in pandas groupby rolling mean/median with dropping missing values? I.e. the output should drop missing values before calculating mean/median instead of giving me NaN if a missing value is present.

import pandas as pd
t = pd.DataFrame(data={v.date:[0,0,0,0,1,1,1,1,2,2,2,2],
                         'i0':[0,1,2,3,0,1,2,3,0,1,2,3],
                         'i1':['A']*12,
                         'x':[10.,20.,30.,np.nan,np.nan,21.,np.nan,41.,np.nan,np.nan,32.,42.]})
t.set_index([v.date,'i0','i1'], inplace=True)
t.sort_index(inplace=True)

print(t)
print(t.groupby('date').apply(lambda x: x.rolling(window=2).mean()))

gives

               x
date i0 i1      
0    0  A   10.0
     1  A   20.0
     2  A   30.0
     3  A    NaN
1    0  A    NaN
     1  A   21.0
     2  A    NaN
     3  A   41.0
2    0  A    NaN
     1  A    NaN
     2  A   32.0
     3  A   42.0

               x
date i0 i1      
0    0  A    NaN
     1  A   15.0
     2  A   25.0
     3  A    NaN
1    0  A    NaN
     1  A    NaN
     2  A    NaN
     3  A    NaN
2    0  A    NaN
     1  A    NaN
     2  A    NaN
     3  A   37.0

while I want the following for this example:

               x
date i0 i1      
0    0  A   10.0
     1  A   15.0
     2  A   25.0
     3  A   30.0
1    0  A    NaN
     1  A   21.0
     2  A   21.0
     3  A   41.0
2    0  A    NaN
     1  A    NaN
     2  A   32.0
     3  A   37.0

what I tried

t.groupby('date').apply(lambda x: x.rolling(window=2).dropna().median())

and

t.groupby('date').apply(lambda x: x.rolling(window=2).median(dropna=True))

(both raise exceptions, but maybe there exists something along the lines)

Thank you for your help!

S.V
  • 2,149
  • 2
  • 18
  • 41
  • 1
    https://stackoverflow.com/questions/15771472/pandas-rolling-mean-by-time-interval, adding min_periods – BENY Jul 03 '19 at 14:28

1 Answers1

5

You're looking for min_periods? Note that you don't need apply, callGroupBy.Rolling directly:

t.groupby('date', group_keys=False).rolling(window=2, min_periods=1).mean()
               x
date i0 i1      
0    0  A   10.0
     1  A   15.0
     2  A   25.0
     3  A   30.0
1    0  A    NaN
     1  A   21.0
     2  A   21.0
     3  A   41.0
2    0  A    NaN
     1  A    NaN
     2  A   32.0
     3  A   37.0
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 1
    I feel like stack's server is acting up today , can not see the question in time :-( – BENY Jul 03 '19 at 14:33
  • @WeNYoBen Yeah same. Btw just sae your comment. Is this a dupe? I feel like it is close but the use case for min periods is slightly different here. – cs95 Jul 03 '19 at 14:34
  • I do not think that is dup , since he have two problem , forget adding the min_periods and also rolling should not with apply – BENY Jul 03 '19 at 14:35
  • @cs95 I use apply since there is a [bug in pandas](https://github.com/pandas-dev/pandas/issues/14013), which will only be fixed in pandas-0.25. Just look how messed up your output is if apply is not used (two 'date' index levels). – S.V Jul 03 '19 at 14:39
  • @S.V Didn't notice that, but fixed ;) – cs95 Jul 03 '19 at 14:40