Python - Multiprocessing and shared memory

Question

I am implementing a genetic algorithm using Deap framework. The algorithm works, but I noticed that the multi-process version of the GA is very memory consuming 9 GB, against the 2 GB of the single-process and I suspect because it has been allocate memory for each process. In fact as soon as the map is executed the memory used increases. Since data shared among processes are used only to be read, all of them can access to the same memory.

This is the structure of my code.

def evaluate(individual, dataset=None):

    penalty = dataset.compute(individual)

    return penalty


def initialize():
   dataset = dataset(file1, file2)

   pool = multiprocessing.Pool()
   toolbox.register("map", pool.map)

   toolbox.register("evaluate", evaluate, dataset=dataset)

   return toolbox, dataset


def main():
   toolbox, dataset = initialize()

   dataset.data = some_training_set

   fitnesses = toolbox.map(toolbox.evaluate, population)

   dataset.data = some_validation_set

   fitnesses = toolbox.map(toolbox.evaluate, population)

Then I have a class containing the dataset (read by using pandas) and a dictionary.

class Dataset:

    def __init__(self, file1, file2):
        self.data = read(file1)
        self.dict = loadpickle(file2)

    def compute(self, individual):
       for row in self.data
           # some stuff reading row and self.dict

What is the easiest way to share the memory? I tried to use global variables for self.data and self.dict, but nothing...

I read several topics about this problem on the forum, but I am not good enough to find a solution alone. Can you show me what is the right way to share data, please? — user2297037, Jan 16 '15 at 18:28
If you are looking for a GA code that can use `multiprocessing` or `threading` interchangeably, then you might want to look at using `mystic`, which when run using a `ThreadingPool` instance from the `pathos.multiprocessing` fork of `multiprocessing`, does what you want -- GA in parallel using threads, without a huge memory hit. Get all the relevant codes here: https://github.com/uqfoundation. I'd have answered this, since it's really what your question is after... but the question was already closed. — Mike McKerns, Jan 17 '15 at 17:21

score 2 · Answer 1 · answered Jan 16 '15 at 20:00

The multiprocessing module is uses the multiple processes model rather than threaded model so each process can't share memory (without using shared memory IPC calls). The Deap framework would need to be redesigned under the hood to use threads if you needed it to share memory.

Python - Multiprocessing and shared memory

1 Answers1