For my current project I need to merge some dataframes based on time intervals. There are of course plenty of answers for this question, for example, I used answer from this question previously and it works great, but it involves double for-cycles.
This current issues is this. I have a series of ~500 dataframes similar to df_1 here:
time value
42 2020-07-01 00:57:16 0.356310
43 2020-07-01 01:00:20 0.472270
44 2020-07-01 03:33:40 0.356310
45 2020-07-01 03:45:37 0.356310
46 2020-07-01 03:57:51 0.425053
47 2020-07-01 05:29:20 0.340471
and they need to have event
column joined from df_2:
start end event
49 2020-07-01 03:07:16 2020-07-01 03:09:51 1.0
50 2020-07-01 03:09:51 2020-07-01 03:12:30 1.0
54 2020-07-01 03:21:21 2020-07-01 03:24:46 1.0
58 2020-07-01 03:33:40 2020-07-01 03:36:43 1.0
59 2020-07-01 03:36:43 2020-07-01 03:39:33 1.0
60 2020-07-01 03:39:33 2020-07-01 03:42:57 1.0
in case df_1.time
falls between df_2.start
and df_2.end
so that in the end I can get:
time value event
42 2020-07-01 00:57:16 0.356310 NaN
43 2020-07-01 01:00:20 0.472270 NaN
44 2020-07-01 03:33:40 0.356310 1.0
45 2020-07-01 03:45:37 0.356310 NaN
46 2020-07-01 03:57:51 0.425053 NaN
47 2020-07-01 05:29:20 0.340471 NaN
The problem is that as I said, I have ~500 dataframes similar to df_1 with over 10k records each and df_2 itself is ~40000 rows long. So looping over all of the rows is out of the question. It will take and exorbitant amount of time.
But for the life of me I can't find any vectorized method in Pandas that can be used for that purpose. Any ideas?