I have two dataframes like as shown below. Already referred the posts here, here, here and here. Don't mark it as duplicate
id,id2,app_date
1,'A',20/3/2017
1,'A',28/8/2017
3,'B',18/10/2017
4,'C',15/2/2017
tf = pd.read_clipboard(sep=',')
tf['app_date'] = pd.to_datetime(tf['app_date'],dayfirst=True)
id,valid_from,valid_to,s_flag
1,20/1/2017,30/4/2017,0
1,28/11/2017,15/2/2018,1
1,18/12/2017,24/2/2018,0
2,15/7/2017,15/11/2017,1
2,2/2/2017,2/6/2017,0
2,11/5/2016,11/6/2016,1
df = pd.read_clipboard(sep=',')
df['valid_from'] = pd.to_datetime(df['valid_from'],dayfirst=True)
df['valid_to'] = pd.to_datetime(df['valid_to'],dayfirst=True)
I would like to do the below
a) check whether tf['app_date']
is within the df['valid_from']
and df['valid_to']
for matching id
b) If yes, then copy the column s_flag
to tf
dataframe for matching id
I tried the below but am not sure whether the below is efficient for million records plus dataframes
t1 = tf.merge(df, how = 'left',on=['id'])
t1 = t1.loc[(t1.app_date >= t1.valid_from) & (t1.app_date <= t1.valid_to),['id','s_flag','app_date']]
tf.merge(t1, how = 'inner',on=['id','app_date'])
While the above works in sample data, but in real data, for some records, I encounter issues like below
You can see that 9/1/2017
approval date doesn't meet the condition for the 2nd and 3rd row but it is still returned as output. This is incorrect.
I expect my output to be like as shown below
id app_date s_flag
0 1 2017-03-20 0.0
2 3 2017-10-18 NaN
3 4 2017-02-15 NaN