I managed to get what you need with vector operations, so it should be the fastest way possible.
import pandas as pd
data = pd.DataFrame({
"period": [
'3/1/2017', '3/1/2017', '3/1/2017', '3/1/2017', '3/1/2017', '3/1/2017', '3/1/2017',
'4/1/2017', '4/1/2017', '4/1/2017', '4/1/2017', '4/1/2017', '4/1/2017', '4/1/2017'
],
"dimname": [
'fac_cpt', 'fac_cpt', 'fac_cpt', 'fac_cpt', 'fac_cpt', 'ser_cpt', 'ser_cpt',
'fac_cpt', 'fac_cpt', 'fac_cpt', 'fac_cpt', 'fac_cpt', 'ser_cpt', 'ser_cpt'
],
"facility": ['a', 'a', 'a', 'b', 'b', None, None, 'a', 'a', 'a', 'b', 'b', None, None],
"cpt": ['p1', 'p2', 'p3', 'p1', 'p2', 'p1', 'p2', 'p1', 'p2', 'p3', 'p1', 'p2', 'p1', 'p1'],
"ser": [None, None, None, None, None, 'c', 'c', None, None, None, None, None, 'd', 'd'],
"gcr": [1, 10, 2, 3, 8, 12, 4, 4, 10, 2, 4, 11, 6, 2]
})
data.period = data.period.apply(pd.to_datetime)
data[["period", "dimname", "facility", "cpt", "gcr"]].groupby(
['dimname', 'facility', 'cpt']
).rolling(6, min_periods=1, on='period').mean().reset_index(
3, drop=True
).reset_index().rename(columns={'gcr': 'avg6monthgcr'})
# Output:
| dimname | facility | cpt | avg6monthgcr | period
----------------------------------------------------
0 | fac_cpt | a | p1 | 1.0 | 2017-03-01
1 | fac_cpt | a | p1 | 2.5 | 2017-04-01
2 | fac_cpt | a | p2 | 10.0 | 2017-03-01
3 | fac_cpt | a | p2 | 10.0 | 2017-04-01
4 | fac_cpt | a | p3 | 2.0 | 2017-03-01
5 | fac_cpt | a | p3 | 2.0 | 2017-04-01
6 | fac_cpt | b | p1 | 3.0 | 2017-03-01
7 | fac_cpt | b | p1 | 3.5 | 2017-04-01
8 | fac_cpt | b | p2 | 8.0 | 2017-03-01
9 | fac_cpt | b | p2 | 9.5 | 2017-04-01
I timed it on your dataset, but there was only a marginal gain, probably because all the initialization took the majority of the time, not the calculation, so you should give it a try on the full dataset.
# your method:
27.6 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# my method:
24.9 ms ± 2.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
If you need to merge it back to the original dataframe, you should modify the code to keep the original index, because merging is faster on them, so it would look like so:
avg_data = data[["period", "dimname", "facility", "cpt", "gcr"]].groupby(['dimname', 'facility', 'cpt']).rolling(6, min_periods=1, on='period').mean().reset_index(level=3).reset_index(drop=True).set_index('level_3').rename(columns={'gcr': 'avg6monthgcr'}).drop('period', axis=1)
data.merge(avg_data, left_index=True, right_index=True, how='left')