Can looping over object instantiations cause a memory leak in Python?

Question

I'm running an agent-based model in Python 3.9 using object-oriented programming. The point of the model is to simulate a predator-prey-population in a changing landscape. When I try to run multiple simulations using a for-loop, the runtime for one simulation increases with each run. I'm suspecting there is some sort of memory leak, but I'm not able to figure it out.

Here is a sketch of my code:

# Parameters
n_deers = ...
n_wolves = ...
# etc.


# Functions
def some_function(arg):
    pass 


# Helper objects
some_dict = ...


# Classes
class Deer:
   pass

class Wolf:
   pass

class Environment:

   def __init__(self):
      self.deers = [Deer(ID = i) for i in range(n_deers)]
      self.wolves = [Wolf(ID = i) for i in range(n_wolves)]
      
      self.data = pd.DataFrame()


   def simulation(self):
      pass


# Simulations
for i in range(100):
     environment = Environment()
     environment.simulation()
     environment.data.to_csv()

In words: I have global parameters, global functions, and a global dictionary that the class instances use. There is a class for each type of animal, and there is a class for the environment that generates a certain number of each animal inside the environment. The environment tracks these animals in a data frame during one run of simulation, in which the animals move, feed, reproduce, die etc.

My fear is that somehow the instances of the animals (at a full length-simulation around 7000 animals per simulation) are being dragged along in the memory. I don't have static class variables as this article warns: https://theorangeone.net/posts/static-vars/ . But of course, this could be anything.

Do you have an idea what could be causing this? Any help is greatly appreciated.

EDIT

I have been able (it seems) to isolate the problem. It seems to originate from the animal movement. Here is a minimal reproducible example. As explanation: If I have the animals choose their next position at random from the adjacent cells, the problem does not seem to occur. Once I add memory, home ranges, and the function cell_choice(), the simulations take longer over time. On my machine, with this parametrization, the first simulation takes between 3 and 4 seconds, and the last between 10 and 11.

# MINIMAL MOVEMENT MODEL

# IMPORTS
import random as rd
import numpy as np
import time
import psutil


# REPRODUCIBILITY
rd.seed(42)


# PARAMETERS
landscape_size = 11
n_deers = 100
years = 10
length_year = 360
timesteps = years*length_year
n_simulations = 20


# HELPER FUNCTIONS AND OBJECTS
# Landscape for first initialization
mock_landscape = np.zeros((landscape_size,landscape_size))

# Function to return a list of nxn cells around a given cell
def range_finder(matrix, position, radius):
    adj = []
    
    lower = 0 - radius
    upper = 1 + radius
    
    for dx in range(lower, upper):
        for dy in range(lower, upper):
            rangeX = range(0, matrix.shape[0])  # Identifies X bounds
            rangeY = range(0, matrix.shape[1])  # Identifies Y bounds
            
            (newX, newY) = (position[0]+dx, position[1]+dy)  # Identifies adjacent cell
            
            if (newX in rangeX) and (newY in rangeY) and (dx, dy) != (0, 0):
                adj.append((newX, newY))
    
    return adj

# Nested dictionary that contains all sets of neighbors for all possible distances up to half the landscape size
neighbor_dict = {d: {(i,j): range_finder(mock_landscape, (i,j), d)
                     for i in range(landscape_size) for j in range(landscape_size)}
                 for d in range(1,int(landscape_size/2)+1)}


# Function that picks the cell in the home range that was visited longest ago
def cell_choice(position, home_range, memory):
     # These are all the adjacent cells to the current position
     adjacent_cells = neighbor_dict[1][position]
     # This is the subset of cells of the adjacent cells belonging to homerange
     possible_choices = [i for i in adjacent_cells if i in home_range]
     # This yields the "master" indeces of those choices
     indeces = []
     for i in possible_choices:
         indeces.append(home_range.index(i))
     # This picks the index with the maximum value in the memory (ie visited longest ago)
     memory_values = [memory[i] for i in indeces]
     pick_index = indeces[memory_values.index(max(memory_values))]
     # Sets that values memory to zero
     memory[pick_index] = 0
     # # Adds one period to every other index
     other_indeces = [i for i in list(range(len(memory))) if i != pick_index]
     for i in other_indeces:
         memory[i] += 1
     # Returns the picked cell
     return home_range[pick_index]



# CLASS DEFINITIONS
class Deer:
    
    def __init__(self, ID):
        
        self.ID = ID
        self.position = (rd.randint(0,landscape_size-1),rd.randint(0,landscape_size-1))
        # Sets up a counter how long the deer has been in the cell
        self.time_spent_in_cell = 1
        
        # Defines a distance parameter that specifies the radius of the homerange around the base
        self.movement_radius = 1
        
        # Defines an initial home range around the position
        self.home_range = neighbor_dict[self.movement_radius][self.position]
        self.home_range.append(self.position)
        
        # Sets up a list of counters how long ago cells in the home range have been visited
        self.memory = [float('inf')]*len(self.home_range)
        self.memory[self.home_range.index(self.position)] = 0


    def move(self):
        
        self.position = cell_choice(self.position, self.home_range, self.memory)


class Environment:
    
    def __init__(self):
        
        self.landscape = np.zeros((landscape_size, landscape_size))
        self.deers = [Deer(ID = i) for i in range(n_deers)]

        
    def simulation(self):
        
        for timestep in range(timesteps):
            for deer in self.deers:
                deer.move()
                
                

# SIMULATIONS

process = psutil.Process()

times = []
memory = []

for i in range(1,n_simulations+1):
    print(i, " out of ",n_simulations)
    start_time = time.time()
    environment = Environment()
    environment.simulation()
    times.append(time.time() - start_time)
    memory.append(process.memory_info().rss)
    
print(times)
print(memory)

Do you definitely see memory usage increase from iteration to iteration, or are you inferring it (not that it's a bad guess) from the runtime? — slothrop, Jun 29 '23 at 11:31
Good point. I'm inferring that. What would be a good way to test for that? — Peter Kamal, Jun 29 '23 at 12:18
For a quick check, Task Manager (or equivalent on your system) should show you if memory usage is growing to the extent that you run out of physical memory - which is what would usually cause slowness. For more granularity see https://stackoverflow.com/questions/552744/how-do-i-profile-memory-usage-in-python or https://stackoverflow.com/questions/938733/total-memory-used-by-python-process — slothrop, Jun 29 '23 at 12:39
You need to make your example more complex and produce a https://stackoverflow.com/help/minimal-reproducible-example or at least post the full code even if people can't run it, so one could investigate what is going on. What you observe is something very implementation specific. "Leaked memory" (not really a thing in python) can definitely increase runtime if e.g. a dict grows bigger and bigger leading to very long access times. But as said it is not possible to say what is wrong with what you have provided, since these problems really come down to the implementation not the idea. — Nopileos, Jun 29 '23 at 12:53
@slothrop I added a very basic memory component to the code. Memory seems to be increasing, but not by a lot. The Task Manager does not report any anomalies. — Peter Kamal, Jun 30 '23 at 09:44
@Nopileos I added a minimal reproducible example that isolates the problem. See the edit made up top. — Peter Kamal, Jun 30 '23 at 09:44
`self.home_range.append(self.position)` mutates a list embedded inside the global `neighbor_dict`. — slothrop, Jun 30 '23 at 09:49
Could you elaborate on why please? I thought that self.home_range is just its own list when I define it as one entry out of the dictionary. If I then append the position, why does it change the dictionary? — Peter Kamal, Jun 30 '23 at 09:53

slothrop · Accepted Answer · 2023-06-30T10:01:37.097

These lines in the constructor of Deer will be problematic:

self.home_range = neighbor_dict[self.movement_radius][self.position]
self.home_range.append(self.position)

The first line makes the name self.home_range refer to a list object in an inner dictionary of neighbor_dict (a list object originally returned from calling the range_finder function).

Then the second line mutates that list. This means that subsequent retrievals from neighbor_dict will get the latest version of that mutated list, not the value originally returned by range_finder.

The growing sizes of these list objects will likely cause some slowdown, but also make your simulation results incorrect.

You should be able to fix this by making self.home_range refer to a copy of the list. One way to do that is:

self.home_range = neighbor_dict[self.movement_radius][self.position].copy()

There are some alternative syntactic choices for that if you prefer. See How do I clone a list so that it doesn't change unexpectedly after assignment?.

For a summary of how names refer to objects in Python, see also Ned Batchelder's "Facts and myths about Python names and values".

Oh my god thank you so much, what a lifesaver! That solved the problem perfectly on the minimal scale, I'll try and look through my full version if I have similar incidents somewhere and correct it. Thanks again, I appreciate this so much! — Peter Kamal, Jun 30 '23 at 10:02
It works on the big scale! The time even reduces with each simulation in the first couple iterations. Can't thank you enough. — Peter Kamal, Jun 30 '23 at 10:33

Can looping over object instantiations cause a memory leak in Python?

1 Answers1