0

I am trying to optimize this block of slicing

for row in tempDF.itertuples(index=True):
        # x and y come from somewhere else
        tp = tempDF.iloc[x:y]

        # created a vector of boolean based on
        # https://datascience.stackexchange.com/questions/23264/

        condition = (
            (tp['date_minute'].dt.date == row.Index.date()) &
            (tp['date_minute'] > row.Index) &
            (pd.Timestamp('10:30').time() <= tp['date_minute'].dt.time) &
            (tp['date_minute'].dt.time <= pd.Timestamp('17:35').time())
        )

        sliced = tp[condition]

Using the line_profiler extension, about 17 % of the code execution time, of a larger piece, is spent on the condition line. This is equivalent to about 2 seconds. As you can imagine, altogether the execution time is 12 seconds for a dataframe of 30,000 rows so I am trying to optimize as much as possible.

Is there a way of reducing the time spent in this block?

Thanks.

rebob
  • 1
  • 2
    Kindly share sample data, with expected output. It could be significantly faster if u used vectorised methods and avoided this form of iteration. – sammywemmy Apr 19 '20 at 20:30
  • Looping over dataframes is generally very slow, this answer will explain in detail and will give you a few hints on possible solutions: https://stackoverflow.com/a/55557758/4014051 – Ralvi Isufaj Apr 19 '20 at 21:34

0 Answers0