I have two data frames of different sizes (~100k records). Df1 contains customer-ID and Purchase date. Df2 contains customer-ID and Visit date.
I want to create a new column in df1 by counting the number of times the customer has visited the shop (using "Visit Date" from df2), before making a purchase. The condition is that the visit date should be less than 30 days before the Purchase date.
Below is the sample data
df1:
df1 = pd.DataFrame({'Cust ID': [1,2,2,2,3,3], 'Transaction ID':[1001,1002,1003,1004,1005,1006], 'Purchase Date':["1/20/2017", "1/20/2018", "1/20/2017", "1/5/2017","1/20/2017","1/20/2017"]})`
Cust ID Transaction ID Purchase Date
0 1 1001 1/20/2017
1 2 1002 1/20/2018
2 2 1003 1/20/2017
3 2 1004 1/5/2017
4 3 1005 1/20/2017
5 3 1006 1/20/2017
df2:
df2 = pd.DataFrame({'Cust ID': [1,1,1,1,1,2,2,2], 'Visit Date':["1/2/2017", "1/3/2017", "1/4/2017", "12/5/2017", "1/23/2017", "1/2/2017","1/3/2017","1/24/2017"]})
Cust ID Store-ID Visit Date
0 1 A1 1/2/2017
1 1 A1 1/3/2017
2 1 A1 1/4/2017
3 1 A1 12/5/2017
4 1 A1 1/23/2017
5 2 A1 1/2/2017
6 2 A1 1/3/2017
7 2 A1 1/24/2017
Expected Output:
Cust ID Transaction ID Purchase Date Count of (Past 1-month visit)
0 1 1001 1/20/2017 3
1 2 1002 1/20/2017 2
2 2 1003 1/20/2018 0
3 2 1004 1/5/2017 2
4 3 1005 1/20/2017 0
5 3 1006 1/20/2017 0
I am pretty new to python and pandas.Your kind help is really appreciated.
Regards Karthik.