I have 2 dataframes, df1 and df2.
df1 (the lookup)
group_id date value
0 105716 1/30/2019 Soccer
1 105717 1/30/2019 Football
2 105718 1/30/2019 Rest
3 105719 1/30/2019 Soccer
4 105716 1/31/2019 Rest
5 105717 1/31/2019 Rest
6 105718 02/01/2019 Football
7 105719 02/01/2019 Soccer
8 105719 02/02/2019 Tennis
8 105722 02/03/2019 Tennis
df2 (the parent)
GROUP_ID STARTDATE ENDDATE
0 105716 1/30/2019 1/30/2019
1 105717 1/30/2019 1/30/2019
2 105718 1/30/2019 1/30/2019
3 105719 1/30/2019 1/30/2019
4 105716 1/30/2019 1/31/2019
5 105717 1/31/2019 1/31/2019
6 105718 1/31/2019 1/31/2019
7 105719 1/31/2019 1/31/2019
8 105716 1/31/2019 1/31/2019
9 105717 1/31/2019 1/31/2019
10 105718 1/31/2019 1/31/2019
11 105719 1/31/2019 2/1/2019
12 105716 2/1/2019 2/1/2019
13 105717 2/1/2019 2/1/2019
14 105718 2/1/2019 2/1/2019
15 105719 2/1/2019 2/1/2019
16 105716 2/1/2019 2/1/2019
17 105717 2/1/2019 2/1/2019
18 105718 2/1/2019 2/1/2019
19 105719 2/1/2019 2/1/2019
20 105716 2/1/2019 2/2/2019
21 105717 2/2/2019 2/2/2019
22 105718 2/2/2019 2/2/2019
23 105719 2/2/2019 2/2/2019
24 105716 2/2/2019 2/2/2019
25 105717 2/2/2019 2/2/2019
26 105718 2/2/2019 2/2/2019
27 105719 2/2/2019 2/3/2019
28 105716 2/3/2019 2/3/2019
29 105722 2/3/2019 2/3/2019
df2 (the output)
GROUP_ID STARTDATE ENDDATE VALUE
0 105716 1/30/2019 1/30/2019 Soccer
1 105717 1/30/2019 1/30/2019 Football
2 105718 1/30/2019 1/30/2019 Rest
3 105719 1/30/2019 1/30/2019 Soccer
4 105716 1/30/2019 1/31/2019 Rest
5 105717 1/31/2019 1/31/2019 Rest
6 105718 1/31/2019 1/31/2019 None
7 105719 1/31/2019 1/31/2019 None
8 105716 1/31/2019 1/31/2019 Rest
9 105717 1/31/2019 1/31/2019 Rest
10 105718 1/31/2019 1/31/2019 None
11 105719 1/31/2019 2/1/2019 None
12 105716 2/1/2019 2/1/2019 None
13 105717 2/1/2019 2/1/2019 None
14 105718 2/1/2019 2/1/2019 Football
15 105719 2/1/2019 2/1/2019 Soccer
16 105716 2/1/2019 2/1/2019 None
17 105717 2/1/2019 2/1/2019 None
18 105718 2/1/2019 2/1/2019 Football
19 105719 2/1/2019 2/1/2019 Soccer
20 105716 2/1/2019 2/2/2019 None
21 105717 2/2/2019 2/2/2019 None
22 105718 2/2/2019 2/2/2019 None
23 105719 2/2/2019 2/2/2019 Tennis
24 105716 2/2/2019 2/2/2019 None
25 105717 2/2/2019 2/2/2019 None
26 105718 2/2/2019 2/2/2019 None
27 105719 2/2/2019 2/3/2019 None
28 105716 2/3/2019 2/3/2019 None
29 105722 2/3/2019 2/3/2019 Tennis
I am trying to add a RESULT field to df2 and populate it with value from df1 where GROUP_ID = group_id and date is between STARTDATE and ENDDATE while keeping all the rows in df2. Any Nan\Null values will be set to 'None'. I can do this with a loop but it takes a while to go through everything what I tried was numpy.where()
df2['RESULT'] = 'None'
df2.result = np.where(((df1.group_id==df2.GROUP_ID)&((df1.date>=df2.STARTDATE)&(df1.date>=df2.ENDDATE))), df1.value, 'None')
and vectorized approach
df2.result = df1.value[(df1.group_id==df2.GROUP_ID)&((df1.date>=df2.STARTDATE)&(df1.date>=df2.ENDDATE))]
and a merge approach
df_activity = pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')[((pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')['STARTDATE'] <= pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id').date)&(pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')['ENDDATE'] >= pd.merge(df2, df1, left_on='GROUP_ID', right_on='group_id')['date']))]
The first 2 I tried give me an error
ValueError: Can only compare identically-labeled DataFrame objects
the merge works but it also drops all the rows that don't match. I think I can work around this with another merge but if there is a faster more streamlined way I would like to use it.