I'm running an agent-based model in Python 3.9 using object-oriented programming. The point of the model is to simulate a predator-prey-population in a changing landscape. When I try to run multiple simulations using a for-loop, the runtime for one simulation increases with each run. I'm suspecting there is some sort of memory leak, but I'm not able to figure it out.
Here is a sketch of my code:
# Parameters
n_deers = ...
n_wolves = ...
# etc.
# Functions
def some_function(arg):
pass
# Helper objects
some_dict = ...
# Classes
class Deer:
pass
class Wolf:
pass
class Environment:
def __init__(self):
self.deers = [Deer(ID = i) for i in range(n_deers)]
self.wolves = [Wolf(ID = i) for i in range(n_wolves)]
self.data = pd.DataFrame()
def simulation(self):
pass
# Simulations
for i in range(100):
environment = Environment()
environment.simulation()
environment.data.to_csv()
In words: I have global parameters, global functions, and a global dictionary that the class instances use. There is a class for each type of animal, and there is a class for the environment that generates a certain number of each animal inside the environment. The environment tracks these animals in a data frame during one run of simulation, in which the animals move, feed, reproduce, die etc.
My fear is that somehow the instances of the animals (at a full length-simulation around 7000 animals per simulation) are being dragged along in the memory. I don't have static class variables as this article warns: https://theorangeone.net/posts/static-vars/ . But of course, this could be anything.
Do you have an idea what could be causing this? Any help is greatly appreciated.
EDIT
I have been able (it seems) to isolate the problem. It seems to originate from the animal movement. Here is a minimal reproducible example. As explanation: If I have the animals choose their next position at random from the adjacent cells, the problem does not seem to occur. Once I add memory, home ranges, and the function cell_choice()
, the simulations take longer over time. On my machine, with this parametrization, the first simulation takes between 3 and 4 seconds, and the last between 10 and 11.
# MINIMAL MOVEMENT MODEL
# IMPORTS
import random as rd
import numpy as np
import time
import psutil
# REPRODUCIBILITY
rd.seed(42)
# PARAMETERS
landscape_size = 11
n_deers = 100
years = 10
length_year = 360
timesteps = years*length_year
n_simulations = 20
# HELPER FUNCTIONS AND OBJECTS
# Landscape for first initialization
mock_landscape = np.zeros((landscape_size,landscape_size))
# Function to return a list of nxn cells around a given cell
def range_finder(matrix, position, radius):
adj = []
lower = 0 - radius
upper = 1 + radius
for dx in range(lower, upper):
for dy in range(lower, upper):
rangeX = range(0, matrix.shape[0]) # Identifies X bounds
rangeY = range(0, matrix.shape[1]) # Identifies Y bounds
(newX, newY) = (position[0]+dx, position[1]+dy) # Identifies adjacent cell
if (newX in rangeX) and (newY in rangeY) and (dx, dy) != (0, 0):
adj.append((newX, newY))
return adj
# Nested dictionary that contains all sets of neighbors for all possible distances up to half the landscape size
neighbor_dict = {d: {(i,j): range_finder(mock_landscape, (i,j), d)
for i in range(landscape_size) for j in range(landscape_size)}
for d in range(1,int(landscape_size/2)+1)}
# Function that picks the cell in the home range that was visited longest ago
def cell_choice(position, home_range, memory):
# These are all the adjacent cells to the current position
adjacent_cells = neighbor_dict[1][position]
# This is the subset of cells of the adjacent cells belonging to homerange
possible_choices = [i for i in adjacent_cells if i in home_range]
# This yields the "master" indeces of those choices
indeces = []
for i in possible_choices:
indeces.append(home_range.index(i))
# This picks the index with the maximum value in the memory (ie visited longest ago)
memory_values = [memory[i] for i in indeces]
pick_index = indeces[memory_values.index(max(memory_values))]
# Sets that values memory to zero
memory[pick_index] = 0
# # Adds one period to every other index
other_indeces = [i for i in list(range(len(memory))) if i != pick_index]
for i in other_indeces:
memory[i] += 1
# Returns the picked cell
return home_range[pick_index]
# CLASS DEFINITIONS
class Deer:
def __init__(self, ID):
self.ID = ID
self.position = (rd.randint(0,landscape_size-1),rd.randint(0,landscape_size-1))
# Sets up a counter how long the deer has been in the cell
self.time_spent_in_cell = 1
# Defines a distance parameter that specifies the radius of the homerange around the base
self.movement_radius = 1
# Defines an initial home range around the position
self.home_range = neighbor_dict[self.movement_radius][self.position]
self.home_range.append(self.position)
# Sets up a list of counters how long ago cells in the home range have been visited
self.memory = [float('inf')]*len(self.home_range)
self.memory[self.home_range.index(self.position)] = 0
def move(self):
self.position = cell_choice(self.position, self.home_range, self.memory)
class Environment:
def __init__(self):
self.landscape = np.zeros((landscape_size, landscape_size))
self.deers = [Deer(ID = i) for i in range(n_deers)]
def simulation(self):
for timestep in range(timesteps):
for deer in self.deers:
deer.move()
# SIMULATIONS
process = psutil.Process()
times = []
memory = []
for i in range(1,n_simulations+1):
print(i, " out of ",n_simulations)
start_time = time.time()
environment = Environment()
environment.simulation()
times.append(time.time() - start_time)
memory.append(process.memory_info().rss)
print(times)
print(memory)