I have 2 dataframes:
df1 and df2 ,df1 is used to be a reference or lookups file for df2.It means we need use each row of df1 to match with each row of df2 and then merge df1 into df2 and then out put new df2.
df1:
RB BeginDate EndDate Valindex0
0 00 19000100 19811231 45
1 00 19820100 19841299 47
2 00 19850100 20010699 50
3 00 20010700 99999999 39
df2:
RB IssueDate gs
0 L3 19990201 8
1 00 19820101 G
2 48 19820101 G
3 50 19820101 G
4 50 19820101 G
5 00 19860101 G
6 52 19820101 G
7 53 19820101 G
8 00 19500201 G
how to merge this 2 dataframes base on condition:
if df1['BeginDate'] <= df2['IssueDate'] <= df1['EndDate'] and df1['RB']==df2['RB']:
merge the value of df1['Valindex0'] to df2
Notice the final out put is merge df1 into df2,because df1 is just like a reference or lookup file for df2.It means we need use each row of df1 to match with each row of df2,then output new df2
The output should looks like:
df2:
RB IssueDate gs Valindex0
0 L3 19990201 8 None
1 00 19820101 G 47 # df2['RB']==df1['RB'] and df2['IssueDate'] between df1['BeginDate'] and df1['EndDate'] of this row
2 48 19820101 G None
3 50 19820101 G None
4 50 19820101 G None
5 00 19860101 G 50
6 52 19820101 G None
7 53 19820101 G None
8 00 19500201 G 45
I know one method to do this,but it is extremally slow, especially when the length of d1 is big:
conditions = []
for index, row in df1.iterrows():
conditions.append((df2['IssueDate']>= df1['BeginDate']) &
(df2['IssueDate']<= df1['BeginDate'])&
(df2['RB']==df1['RB']))
df2['Valindex0'] = np.select(conditions, df1['Valindex0'], default=None)
Any faster solution?