I have a DataFrame
with measurements, containing the values of the measurement and the times.
time = [datetime.datetime(2011, 1, 1, np.random.randint(0,23), np.random.randint(1, 59)) for _ in xrange(10)]
df_meas = pandas.DataFrame({'time': time, 'value': np.random.random(10)})
for example:
time value
0 2011-01-01 21:56:00 0.115025
1 2011-01-01 04:40:00 0.678882
2 2011-01-01 02:18:00 0.507168
3 2011-01-01 22:40:00 0.938408
4 2011-01-01 12:53:00 0.193573
5 2011-01-01 19:37:00 0.464744
6 2011-01-01 16:06:00 0.794495
7 2011-01-01 18:32:00 0.482684
8 2011-01-01 13:26:00 0.381747
9 2011-01-01 01:50:00 0.035798
the data-taking is organized in periods and I have another DataFrame
for it:
start = pandas.date_range('1/1/2011', periods=5, freq='H')
stop = start + np.timedelta64(50, 'm')
df_runs = pandas.DataFrame({'start': start, 'stop': stop}, index=np.random.randint(0, 1000000, 5))
df_runs.index.name = 'run'
for example:
start stop
run
721158 2011-01-01 00:00:00 2011-01-01 00:50:00
340902 2011-01-01 01:00:00 2011-01-01 01:50:00
211578 2011-01-01 02:00:00 2011-01-01 02:50:00
120232 2011-01-01 03:00:00 2011-01-01 03:50:00
122199 2011-01-01 04:00:00 2011-01-01 04:50:00
Now I want to merge the two tables, obtaining:
time value run
0 2011-01-01 21:56:00 0.115025 NaN
1 2011-01-01 04:40:00 0.678882 122199
2 2011-01-01 02:18:00 0.507168 211578
3 2011-01-01 22:40:00 0.938408 NaN
...
time periods (run
s) have a start
and a stop
and stop >= start
. Different runs never overlap. (Even if in my example it is not true) you can assume that runs are ordered (by run
) and if run1 < run2
then start1 < start2
(or you can simply sort the table by start
). You can also assume that df_meas
is sorted by time
.
How to do that? Is there something build in? What is the most efficient way?