I have a pandas data frame with forex data by minutes, one year long (371635 rows):
O H L C
0
2017-01-02 02:00:00 1.05155 1.05197 1.05155 1.05190
2017-01-02 02:01:00 1.05209 1.05209 1.05177 1.05179
2017-01-02 02:02:00 1.05177 1.05198 1.05177 1.05178
2017-01-02 02:03:00 1.05188 1.05200 1.05188 1.05200
2017-01-02 02:04:00 1.05196 1.05204 1.05196 1.05203
I want to filter daily data to get an hour range:
dt = datetime(2017,1,1)
df_day = df1[df.index.date == dt.date()]
df_day_t = df_day.between_time('08:30', '09:30')
If I do a for
loop with 200 days, it takes minutes. I suspect that at every step this line
df_day = df1[df.index.date == dt.date()]
is looking for the equality with every row in the data set (even if it is an ordered data set).
Is there any way I could speed up the filtering or I should just do some old imperative for
loop from January to December...?