0

I have two data frames of different sizes (~100k records). Df1 contains customer-ID and Purchase date. Df2 contains customer-ID and Visit date.

I want to create a new column in df1 by counting the number of times the customer has visited the shop (using "Visit Date" from df2), before making a purchase. The condition is that the visit date should be less than 30 days before the Purchase date.

Below is the sample data

df1:

df1 = pd.DataFrame({'Cust ID': [1,2,2,2,3,3], 'Transaction ID':[1001,1002,1003,1004,1005,1006], 'Purchase Date':["1/20/2017", "1/20/2018", "1/20/2017", "1/5/2017","1/20/2017","1/20/2017"]})`

Cust ID Transaction ID  Purchase Date
0   1   1001    1/20/2017
1   2   1002    1/20/2018
2   2   1003    1/20/2017
3   2   1004    1/5/2017
4   3   1005    1/20/2017
5   3   1006    1/20/2017

df2:

df2 = pd.DataFrame({'Cust ID': [1,1,1,1,1,2,2,2],  'Visit Date':["1/2/2017", "1/3/2017", "1/4/2017", "12/5/2017", "1/23/2017", "1/2/2017","1/3/2017","1/24/2017"]})

    Cust ID Store-ID    Visit Date
0   1   A1  1/2/2017
1   1   A1  1/3/2017
2   1   A1  1/4/2017
3   1   A1  12/5/2017
4   1   A1  1/23/2017
5   2   A1  1/2/2017
6   2   A1  1/3/2017
7   2   A1  1/24/2017

Expected Output:

Cust ID Transaction ID  Purchase Date   Count of (Past 1-month visit)
0   1   1001    1/20/2017   3
1   2   1002    1/20/2017   2
2   2   1003    1/20/2018   0
3   2   1004    1/5/2017    2
4   3   1005    1/20/2017   0
5   3   1006    1/20/2017   0

I am pretty new to python and pandas.Your kind help is really appreciated.

Regards Karthik.

r-beginners
  • 31,170
  • 3
  • 14
  • 32
Karthik
  • 3
  • 2
  • Welcome to SO. Please don't post images, images can't be copied. Take a tour of [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Shubham Sharma Jun 12 '20 at 05:39
  • 2
    Thanks much ! I have edited. Sorry , this is my first time using SO. – Karthik Jun 12 '20 at 07:29

1 Answers1

0

The date of purchase is calculated from the date of the visit and the conditional extracts up to 30 days before the visit are combined with the original 'df1'.

df1['Purchase Date'] = pd.to_datetime(df1['Purchase Date'], format='%m/%d/%Y')
df2['Visit Date'] = pd.to_datetime(df2['Visit Date'], format='%m/%d/%Y')
df3 = df2.merge(df1, on='Cust ID')
df3['Past_1M'] = df3['Purchase Date'] - df3['Visit Date']
import datetime
df3 = df3[(df3['Past_1M'] <= datetime.timedelta(30)) & (df3['Past_1M'] >= datetime.timedelta(0))]
df3 = df3.groupby(['Cust ID', 'Transaction ID']).agg('count').reset_index()
df3 = df1.merge(df3, on=['Cust ID', 'Transaction ID'], how='outer').fillna(0)
df3 = df3.iloc[:,[0,1,2,5]]

df3
    Cust ID Transaction ID  Purchase Date_x Past_1M
0   1   1001    2017-01-20  3.0
1   2   1002    2018-01-20  0.0
2   2   1003    2017-01-20  2.0
3   2   1004    2017-01-05  2.0
4   3   1005    2017-01-20  0.0
5   3   1006    2017-01-20  0.0
r-beginners
  • 31,170
  • 3
  • 14
  • 32