2

Not sure if I am doing something wrong (Pandas 1.2.5):

ids = pd.DataFrame(data=range(10), columns=['Id'])
dt = pd.DataFrame(pd.date_range('2021-01-01', '2021-01-10', freq='D'), columns=['Date'])
df = ids.merge(dt, how='cross')
df['Val'] = np.random.randint(1,10, size=len(df))
df.set_index(['Id', 'Date'], inplace=True)
df['Val'].groupby('Id').rolling(window=3).mean()

I would expect the result to include the Date column (otherwise why compute a rolling mean?) but Date is not there:

Id
0          NaN
0          NaN
0     2.333333
0     3.333333
0     3.666667
        ...   
9     5.000000
9     4.000000
9     5.000000
9     5.333333
9     6.000000
Name: Val, Length: 100, dtype: float64

What am I missing?

Also, df['Val'].reset_index('Id').groupby('Id').rolling(window=3).mean() seems to work somehow but returns Id as a data column as well as an index column even if as_index=False is passed in groupby. Very strange!

                Id  Val
Id  Date        
0   2021-01-01  NaN NaN
    2021-01-02  NaN NaN
    2021-01-03  0.0 7.000000
    2021-01-04  0.0 6.333333
    2021-01-05  0.0 4.666667
... ... ... ...
iggy
  • 662
  • 6
  • 14

1 Answers1

2

I think this is a little cleaner,

ids = pd.DataFrame(data=range(10), columns=['Id'])
dt = pd.DataFrame(pd.date_range('2021-01-01', '2021-01-10', freq='D'), columns=['Date'])
df = ids.merge(dt, how='cross')
df['Val'] = np.random.randint(1,10, size=len(df))
df.set_index(['Id'], inplace=True)
df.groupby(['Id']).rolling(window=3,on='Date').mean()#.head(60)

Only change was to not include 'Date' in the index, and roll on='Date'

Clay Shwery
  • 380
  • 1
  • 8
  • 2
    Thanks, I see now. The docs are not exactly clear: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html but it looks like `on` is crucial here except we can't refer to an index column after a groupby so pulling `Date` out of the index is the way to do it `df['Val'].reset_index('Date').groupby('Id').rolling(window=3, on='Date').mean()` – iggy Jun 30 '21 at 22:54