How can I speed up this loop where I'm iterating through rows in a dataframe

Question

TimePeriods2 = [1440] # other elements will be populated to this list later
MovingAverages = [50] # other elements will be populated to this list later

for Time in TimePeriods2:
    k = 1
    for MA in MovingAverages:
        for i in range(len(df)):
            if (df["Close"][i] > df[f"{MA*Time} MA"][i]) & (df["Close"][i+Time] > df[f"{MA*Time} MA"][i+Time]):
                df[f"Moving Average (not crossover) Strategy {k} on {Time} Mins"][i] = "Buy"
        df[f"Moving Average (not crossover) Strategy {k} on {Time} Mins"] = df[f"Moving Average (not crossover) Strategy {k} on {Time} Mins"].fillna("Sell")
        df[f"Moving Average (not crossover) Strategy {k} on {Time} Mins"] = df[f"Moving Average (not crossover) Strategy {k} on {Time} Mins"].astype("category")
        k += 1
    print(f"Time period {Time} complete")

I tried the following:

TimePeriods2 = [1440] # other elements will be populated to this list later
MovingAverages = [50] # other elements will be populated to this list later

for Time in TimePeriods2:
    k = 1
    for MA in MovingAverages:
        df.loc[( (df["Close"] > df[f"{MA*Time} MA"]) & (df["Close"][i+Time] > df[f"{MA*Time} MA"][i+Time]) ), f"Moving Average (not crossover) Strategy {k} on {Time} Mins"] = "Buy"
        df[f"Moving Average (not crossover) Strategy {k} on {Time} Mins"] = df[f"Moving Average (not crossover) Strategy {k} on {Time} Mins"].fillna("Sell")
        df[f"Moving Average (not crossover) Strategy {k} on {Time} Mins"] = df[f"Moving Average (not crossover) Strategy {k} on {Time} Mins"].astype("category")
        k += 1
    print(f"Time period {Time} complete")

But it seems like you cant use the .loc method like this, my dataframe has ~1.4 million rows so needless to say that first way takes forever, while the second way is much quicker except for the fact that I can't get it to work properly.

You **really** shouldn't be using `pandas` like this at all. — juanpa.arrivillaga, Jul 16 '21 at 23:39
I closed this as a duplicate. In particular, pay attention to [this answer](https://stackoverflow.com/a/55557758/5014455) for a deep dive into how you should be working with dataframes. But at the *very least* **never** loop over `len(df)` then try to do `df.iloc[i, whatever]`, that will be pretty much the slowest thing you can do. `iloc` and `loc` are optimized for bulk operations, they require a ton of overhead and doing it row-by-row will make your code super slow. Instead, iterate, if you must, using `df.itertuples()` — juanpa.arrivillaga, Jul 16 '21 at 23:40
In the second solution part of this question I proposed a way to compute this without looping over len(df) as I already knew this was not a fast method. I was really looking for help to solve my issue with using df.loc.... — MoonBoi9001, Jul 17 '21 at 11:42

How can I speed up this loop where I'm iterating through rows in a dataframe

0 Answers0