Similar to this post Excel VLOOKUP equivalent in pandas
I don't need the first value it comes across, but the n'th value.
Here is an example data set, with the desired output.
from pandas import DataFrame
from datetime import timedelta
data = {'date': ['2018-01-01','2018-01-01','2018-01-01','2018-01-02','2018-01-02',
'2018-01-03','2018-01-03','2018-01-03','2018-01-04','2018-01-04',
'2018-01-04','2018-01-05','2018-01-05','2018-01-05','2018-01-06',
'2018-01-06'],
'product': ['123a','123b','123c', '123a', '123b', '123a', '123b', '123c',
'123a', '123b', '123c', '123a', '123b','123c', '123a', '123c'],
'orders': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16],
'desired_output': [0,0,0,0,0,0,0,0,1,2,3,4,5,0,6,8]}
df = DataFrame(data, columns = ['date', 'product', 'orders', 'desired_output'])
df.date = pd.to_datetime(df.date)
df['lag_date'] = df.date - timedelta(days=3)
Example index 14: product 123a, lag_date 2018-01-03, look at date column for product 123a with date 2018-01-03 and show matching orders, hence 6, if no match, return 0.
Currently, I lag the date 3 days, but I want this to be 'n'. I could use the orginial dates as index, but then I would need to reindex the data set later (which is fine).
Is there a handy way for this, instead of looping through all rows, put a counter 'n', and when there are 'n' matches found, take that value. Since my data set has over 500k rows this seems computationally too expensive for a pretty simple task.