I have a massive 5M row list of customer, period, and orders and need to attach rolling window lookups of 3, 6, and 12 months. Below is a sample of the data:
I have a dataframe dftest1:
{'period': {0: 201810, 1: 201811, 2: 201812, 3: 201901, 4: 201902, 5: 201903, 6: 201904, 7: 201905, 8: 201906, 9: 201907, 10: 201908, 11: 201909, 12: 201910, 13: 201911, 14: 201912, 15: 202001, 16: 202002, 17: 202003, 18: 202004, 19: 202005, 20: 202006}, 'customer': {0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'ABC', 4: 'ABC', 5: 'ABC', 6: 'ABC', 7: 'ABC', 8: 'ABC', 9: 'ABC', 10: 'ABC', 11: 'ABC', 12: 'ABC', 13: 'ABC', 14: 'ABC', 15: 'ABC', 16: 'ABC', 17: 'ABC', 18: 'ABC', 19: 'ABC', 20: 'ABC'}, 'orders': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0, 4: 0.0, 5: 0.0, 6: 0.0, 7: 0.0, 8: 0.0, 9: 0.0, 10: 0.0, 11: 1.0, 12: 0.0, 13: 0.0, 14: 0.0, 15: 1.0, 16: 0.0, 17: 2.0, 18: 1.0, 19: 1.0, 20: 1.0}}
Now the following runs and delivers exactly what I want but it takes a very long time to run on my full dataset:
dftest1['countOfOrdersLast3months']=''
dftest1['countOfOrdersLast6months']=''
dftest1['countOfOrdersLast12months']=''
for x, y in dftest1.iterrows():
customer = dftest1['customer'].values[x]
currentDate = dftest1['period'].values[x]
dftest1['countOfOrdersLast3months'].values[x] = dftest1[(dftest1['customer']==customer) & ((dftest1['period'] > currentDate-3) & (dftest1['period'] <= currentDate))]['orders'].sum()
dftest1['countOfOrdersLast6months'].values[x] = dftest1[(dftest1['customer']==customer) & ((dftest1['period'] > currentDate-6) & (dftest1['period'] <= currentDate))]['orders'].sum()
dftest1['countOfOrdersLast12months'].values[x] = dftest1[(dftest1['customer']==customer) & ((dftest1['period'] > currentDate-12) & (dftest1['period'] <= currentDate))]['orders'].sum()
dftest1.head()
Is there a better way to do this faster?