I am implementing a genetic algorithm using Deap framework. The algorithm works, but I noticed that the multi-process version of the GA is very memory consuming 9 GB, against the 2 GB of the single-process and I suspect because it has been allocate memory for each process. In fact as soon as the map is executed the memory used increases. Since data shared among processes are used only to be read, all of them can access to the same memory.
This is the structure of my code.
def evaluate(individual, dataset=None):
penalty = dataset.compute(individual)
return penalty
def initialize():
dataset = dataset(file1, file2)
pool = multiprocessing.Pool()
toolbox.register("map", pool.map)
toolbox.register("evaluate", evaluate, dataset=dataset)
return toolbox, dataset
def main():
toolbox, dataset = initialize()
dataset.data = some_training_set
fitnesses = toolbox.map(toolbox.evaluate, population)
dataset.data = some_validation_set
fitnesses = toolbox.map(toolbox.evaluate, population)
Then I have a class containing the dataset (read by using pandas) and a dictionary.
class Dataset:
def __init__(self, file1, file2):
self.data = read(file1)
self.dict = loadpickle(file2)
def compute(self, individual):
for row in self.data
# some stuff reading row and self.dict
What is the easiest way to share the memory? I tried to use global variables for self.data and self.dict, but nothing...