I have the following dataframes:
print(df)
client_id route_id start
1 1_2 2019-01-01 03:00:00
8 1_4 2019-01-23 10:00:00
3 1_2 2019-01-15 04:10:00
4 1_10 2019-01-10 05:00:00
...
print(accident_df)
route_id start end
1_2 2018-12-24 2018-01-03
1_4 2019-01-25 2019-01-29
1_2 2019-02-24 2019-02-28
1_10 2019-01-05 2019-01-15
...
I would like to merge them, following these conditions (&):
1."route_id" must match
2.df.start must be in between accident_df.start and accident_df.end (therefore, greater than the first one and smaller than the latter)
print(final_df)
client_id route_id start
1 1_2 2019-01-01 03:00:00
4 1_10 2019-01-10 05:00:00
Since both dataframes are quite large (>10M lines), and thousands of "route_id"s exist, a solution based on merging on "route_id" and then selecting rows by conditions would be extremely inefficient, causing memory-related problems (as in Python Pandas - Compare 2 dataframes, multiple parameters).
How can I achieve that efficiently?