0

how do you loop faster in pandas? i'm using 2 loops the first loop will loop through the dates and the second loop will loop through the symbols so in every day, I will go through all of the symbol and analyze data from it.

the code below works properly but will go slow once I add more symbols

# self.Price contain multiple symbols 
# self.ActiveSymbols contains the string name of the symbol
backtest = pd.concat(self.Price, keys=self.ActiveSymbol,axis=1)

for date in backtest.index:
    
    for symbol in ActiveSymbols:
        
        #compute something......
        backtest[symbol].loc[date,'close']

.......

analyzing 1 symbol: time 0.4705786999999999

analyzing 5 symbol: time 3.2083443000000003

.......

  • 1
    The fastest of all loops in pandas is the absence of it ;) Please give a minimal example of your data and explain what you try to achieve. If you want speed, you need to find a way to vectorize your code. – mozway Sep 26 '22 at 12:33
  • i am trying to simulate the live markets, I have little knowledge of vectorization and don't know how to use it :>. what I'm trying to achieve is a simulation of live markets where in every simulated day, I will scan symbols and analyze from it. once I analyze the said symbol in a current date I will make a decision and placed in into a another dataframe where I manage orders. – Karl robeck Alferez Sep 26 '22 at 12:42
  • The real goal (chat the data is) doesn't matter so much, what matters is a description of what you are trying to do in terms of the data itself, whether it's counting apples of stocks. You need to provide a reproducible example. See [how to ask](https://stackoverflow.com/help/how-to-ask) and [reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway Sep 26 '22 at 12:46
  • you use a vectorised process. loops in pandas are sub-optimal... – D.L Sep 26 '22 at 13:48

1 Answers1

-1

afaik there's not really a way to speed up looping in python, it doesn't have anything to do with pandas either just using python for- and while-loops is rather slow. so you would have to find another way to go about it if you need speed. see this: https://youtu.be/Qgevy75co8c

flowerboy
  • 81
  • 7
  • 1
    Pandas is specifically designed to work quickly with big data through matrix/vector operations, and will perform poorly if you loop row-by-row over the data. In this case performance **is** directly related to using pandas improperly. – Ari Cooper-Davis Sep 26 '22 at 12:37
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 30 '22 at 09:19