I am trying to use a the pandas rolling function with window size 2 with groupby. This would be pretty standard other than that I also want the window to include the current value and the proceeding value.
Specifically, given
df = pd.DataFrame({'groups':['a','a','a','a','a','b','b','b','b','b'],
'info': [i for i in range(10)]})
I want
pd.DataFrame({'groups':['a','a','a','a','a','b','b','b','b','b'],
'info': [i for i in range(10)],
'groupsum':[1, 3, 5, 7, nan, 11, 13, 15, 17, nan]})
I have tried 2 strategies, both of which did not work. I first tried
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
df['groupsum'] = df.groupby('groups')['info'].rolling(window=indexer).mean().values
This way leads to kernel crashing, even for this toy dataframe. Very curious as to why.
My second way is to reverse the dataframe then use a regular groupby rolling operation:
df = df.iloc[::-1].copy()
df.index = range(df.shape[0])
df['groupsum'] = df.groupby('groups')['info'].rolling(2).sum().values
While the kernel does not crash with this method, it does not yield the dataframe I'd hoped for; it yields
pd.DataFrame({'groups':['a','a','a','a','a','b','b','b','b','b'],
'info': [i for i in range(10)],
'groupsum':[nan, 7., 5., 3., 1., nan, 17., 15., 13., 11.]})
I suppose there is an obvious solution here that I just don't know. Any help is appreciated!