0

I want to perform various simulations of stock data in a for loop.

Before starting the loop, I initialize an object of the class Simulation which, after calling its query_stock_information() method, stores all relevant data needed for a simulation. I then loop through various servicelevels and perform in each iteration exactly the same simulation by calling the simulate() method of the class (using exactly the same amount of data and following exactly the same operations, there is just some variation in numbers).

However, the simulation becomes slower with every iteration (the first iteration takes about ~6 minutes, the last iteration takes more than 30 minutes). It surprises me as there should be no difference in terms of cpu and memory. It is just the global results list that becomes larger during the simulation, but the amount of data is not that big actually.

Here is some pseudo code of the simulation:

class Simulation:
    (...)

    def query_stock_information(self):
        # request stock data from data warehouse and save in self.stock_data

    def simulate(self, servicelevel):
        self.results = []
        # execute simulation with servicelevel input and stock information from self.stock_data
        # save results as items in self.results

simulation = Simulation()
simulation.query_stock_information()
results = []
for servicelevel in np.arange(0.800, 1.000, 0.005):
    print("Start simulation (servicelevel={})".format(servicelevel)
    simulation.simulate(servicelevel)
    results.extend(simulation.results)  # here i add 2500 (,10) pandas series per iteration
ash bounty
  • 227
  • 3
  • 10
  • 1
    depending on the amount of in-memory data it references, it could be causing the OS to swap a lot of pages to/from disk to avoid capping out and stalling the system. This swap thrash is a huge performance hit. – Todd Mar 01 '20 at 22:10
  • 2
    To identify what object is tying up the most memory you can use a profiler. This answer talks about how to use one: https://stackoverflow.com/a/552810/7915759 if you want to profile what specific functions consume the most time, you can use cProfile – Todd Mar 01 '20 at 22:14

0 Answers0