If I have a dataframe (df_data) like:
ID Time X Y Z H
05 2020-06-26 14:13:16 0.055 0.047 0.039 0.062
05 2020-06-26 14:13:21 0.063 0.063 0.055 0.079
05 2020-06-26 14:13:26 0.063 0.063 0.063 0.079
05 2020-06-26 14:13:31 0.095 0.102 0.079 0.127
... .. ... ... ... ... ... ...
01 2020-07-01 08:59:43 0.063 0.063 0.047 0.079
01 2020-07-01 08:59:48 0.055 0.055 0.055 0.079
01 2020-07-01 08:59:53 0.071 0.063 0.055 0.082
01 2020-07-01 08:59:58 0.063 0.063 0.047 0.082
01 2020-07-01 08:59:59 0.047 0.047 0.047 0.071
[17308709 rows x 8 columns]
which I want to filter by another dataframe of intervals (df_intervals), like:
int_id start end
1 2020-02-03 18:11:59 2020-02-03 18:42:00
2 2020-02-03 19:36:59 2020-02-03 20:06:59
3 2020-02-03 21:00:59 2020-02-03 21:31:00
4 2020-02-03 22:38:00 2020-02-03 23:08:00
5 2020-02-04 05:55:00 2020-02-04 06:24:59
... ... ...
1804 2021-01-10 13:50:00 2021-01-10 14:20:00
1805 2021-01-10 18:10:00 2021-01-10 18:40:00
1806 2021-01-10 19:40:00 2021-01-10 20:10:00
1807 2021-01-10 21:25:00 2021-01-10 21:55:00
1808 2021-01-10 22:53:00 2021-01-10 23:23:00
[1808 rows x 2 columns]
what is the most efficient way to do so? I have a large dataset and if I try to iterate over it like:
for i in range(len(intervals)):
df_filtered = df[df['Time'].between(intervals['start'][i], intervals['end'][i])
...
...
...
it takes forever! I know that I shouldn't iterate over large dataframes, but I have no idea how I could filter it by every interval on the second dataframe.
The steps I'm trying to do are:
1- Get all the intervals (start/end columns) from df_intervals;
2- Use those intervals to create a new dataframe (df_stats) containing the statistics of the columns within those time ranges. Example:
start end ID X_max X_min X_mean Y_max Y_min Y_mean ....
2020-02-03 18:11:59 2020-02-03 18:42:00 01 ... ... ... ... ... ... ... ...
2020-02-03 18:11:59 2020-02-03 18:42:00 02 ... ... ... ... ... ... ... ...
2020-02-03 18:11:59 2020-02-03 18:42:00 03 ... ... ... ... ... ... ... ...
2020-02-03 18:11:59 2020-02-03 18:42:00 04 ... ... ... ... ... ... ... ...
2020-02-03 18:11:59 2020-02-03 18:42:00 05 ... ... ... ... ... ... ... ...
2020-02-03 19:36:59 2020-02-03 20:06:59 01 ... ... ... ... ... ... ... ...
2020-02-03 19:36:59 2020-02-03 20:06:59 02 ... ... ... ... ... ... ... ...
2020-02-03 19:36:59 2020-02-03 20:06:59 03 ... ... ... ... ... ... ... ...