I'm trying to merge two dataframes by time with multiple matches. I'm looking for all the instances of df2 whose timestamp
falls 7 days or less before endofweek
in df1. There may be more than one record that fits the case, and I want all of the matches, not just the first or last (which pd.merge_asof does).
import pandas as pd
df1 = pd.DataFrame({'endofweek': ['2019-08-31', '2019-08-31', '2019-09-07', '2019-09-07', '2019-09-14', '2019-09-14'], 'GroupCol': [1234,8679,1234,8679,1234,8679]})
df2 = pd.DataFrame({'timestamp': ['2019-08-30 10:00', '2019-08-30 10:30', '2019-09-07 12:00', '2019-09-08 14:00'], 'GroupVal': [1234, 1234, 8679, 1234], 'TextVal': ['1234_1', '1234_2', '8679_1', '1234_3']})
df1['endofweek'] = pd.to_datetime(df1['endofweek'])
df2['timestamp'] = pd.to_datetime(df2['timestamp'])
I've tried
pd.merge_asof(df1, df2, tolerance=pd.Timedelta('7d'), direction='backward', left_on='endofweek', right_on='timestamp', left_by='GroupCol', right_by='GroupVal')
but that gets me
endofweek GroupCol timestamp GroupVal TextVal
0 2019-08-31 1234 2019-08-30 10:30:00 1234.0 1234_2
1 2019-08-31 8679 NaT NaN NaN
2 2019-09-07 1234 NaT NaN NaN
3 2019-09-07 8679 NaT NaN NaN
4 2019-09-14 1234 2019-09-08 14:00:00 1234.0 1234_3
5 2019-09-14 8679 2019-09-07 12:00:00 8679.0 8679_1
I'm losing the text 1234_1. Is there way to do a sort of outer join for pd.merge_asof
, where I can keep all of the instances of df2
and not just the first or last?
My ideal result would look like this (assuming that the endofweek
times are treated like 00:00:00 on that date):
endofweek GroupCol timestamp GroupVal TextVal
0 2019-08-31 1234 2019-08-30 10:00:00 1234.0 1234_1
1 2019-08-31 1234 2019-08-30 10:30:00 1234.0 1234_2
2 2019-08-31 8679 NaT NaN NaN
3 2019-09-07 1234 NaT NaN NaN
4 2019-09-07 8679 NaT NaN NaN
5 2019-09-14 1234 2019-09-08 14:00:00 1234.0 1234_3
6 2019-09-14 8679 2019-09-07 12:00:00 8679.0 8679_1