This answer provides a solution to get a rolling sum of a column grouped by another column based on a date window. To reproduce it here:
df = pd.DataFrame(
{
'ID': {0: 10001, 1: 10001, 2: 10001, 3: 10001, 4: 10002, 5: 10002, 6: 10002},
'Date': {
0: datetime.datetime(2019, 7, 1),
1: datetime.datetime(2019, 5, 1),
2: datetime.datetime(2019, 6, 25),
3: datetime.datetime(2019, 5, 27),
4: datetime.datetime(2019, 6, 29),
5: datetime.datetime(2019, 7, 18),
6: datetime.datetime(2019, 7, 15)
},
'Amount': {0: 50, 1: 15, 2: 10, 3: 20, 4: 25, 5: 35, 6: 40},
}
)
amounts = df.groupby(["ID"]).apply(lambda g: g.sort_values('Date').rolling('28d', on='Date').sum())
df['amount_4wk_rolling'] = df["Date"].map(amounts.set_index('Date')['Amount'])
Output:
+-------+------------+--------+--------------------+
| ID | Date | Amount | amount_4wk_rolling |
+-------+------------+--------+--------------------+
| 10001 | 01/07/2019 | 50 | 60 |
| 10001 | 01/05/2019 | 15 | 15 |
| 10001 | 25/06/2019 | 10 | 10 |
| 10001 | 27/05/2019 | 20 | 35 |
| 10002 | 29/06/2019 | 25 | 25 |
| 10002 | 18/07/2019 | 35 | 100 |
| 10002 | 15/07/2019 | 40 | 65 |
+-------+------------+--------+--------------------+
However, if two of the dates are the same then I get the error:
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
This makes sense as I can see on the final line that Date
is being used to set an index which is now no longer unique. However, as I don't really understand what that final line does I'm little stumped on trying to develop an alternative solution.
Could someone help out?