2

I have a SimPy model that returns a random result which I would like to replicate many times. Each replication is independent so to make it faster I'd like to run them in parallel. I've tried Python's multiprocessing, Pathos multiprocessing, and joblib Parallel, but with each approach I get the same error: TypeError: can't pickle generator objects. Is there any way to avoid this error and run the simulation in parallel?

SimPy relies on generators as explained here, so avoiding them isn't possible.

hoffee
  • 470
  • 3
  • 16

2 Answers2

2

The error describes the problem fairly well. Somewhere among the objects you are sending to the child process, a generator is lurking, presumably among the function arguments. Is it possible to convert this generator to a list?

For example, the following raises the error you mention:

from multiprocessing import Pool

def firstn(n):
    k = 0
    while k < n:
        yield k
        k += 1

if __name__ == "__main__":
    p = Pool(2)
    print(p.map(firstn, [1, 2, 3, 4]))

But this one works:

from multiprocessing import Pool

def firstn(n):
    k = 0
    while k < n:
        yield k
        k += 1

def wrapped(n):
    return list(firstn(n))

if __name__ == "__main__":
    p = Pool(2)
    print(p.map(wrapped, [1, 2, 3, 4]))
sjp
  • 382
  • 4
  • 15
  • In SimPy, every process is a generator that yields simulation events, so they're not really possible to avoid. – hoffee Oct 02 '19 at 14:04
  • 1
    I see. Well, there might still be clever ways to go around them. I edited my answer with an example. – sjp Oct 02 '19 at 14:23
  • I had something like this in mind but I couldn't quite piece it together. I'll give it a shot. – hoffee Oct 02 '19 at 14:39
  • Your example works fine when I run standalone functions in parallel, but I run into the same error when I try to use class methods. Any suggestions? – hoffee Oct 02 '19 at 15:29
  • I was able to break out one of the methods into a function to get it to work for my application, but I'm curious why class methods can't be passed to `map`. – hoffee Oct 02 '19 at 16:28
  • 1
    The technical answer is that it's because they cannot be pickled. For a detailed explanation of what can be pickled and why, please see https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled. Class methods interact with the instance state, so the results might be unpredictable if they could be executed in parallell. – sjp Oct 03 '19 at 14:55
1

You need to instantiate the Environment from scratch inside the new process, and take care to use only vanilla types as arguments to be mapped in a Pool. Here is a reworked carwash example (the one from the simpy documentation) that runs 4 parallel simulations with different seeds and prints how many cars were washed in each case.

import multiprocessing as mp
import simpy
import random


NUM_MACHINES = 2  # Number of machines in the carwash
WASHTIME = 5      # Minutes it takes to clean a car
T_INTER = 7       # Create a car every ~7 minutes
SIM_TIME = 20     # Simulation time in minutes


class Carwash(object):
    """A carwash has a limited number of machines (``NUM_MACHINES``) to
    clean cars in parallel.

    Cars have to request one of the machines. When they got one, they
    can start the washing processes and wait for it to finish (which
    takes ``washtime`` minutes).

    """
    def __init__(self, env, num_machines, washtime):
        self.env = env
        self.machine = simpy.Resource(env, num_machines)
        self.washtime = washtime

    def wash(self, car):
        """The washing processes. It takes a ``car`` processes and tries
        to clean it."""
        yield self.env.timeout(WASHTIME)


def car(env, name, cw):
    """The car process (each car has a ``name``) arrives at the carwash
    (``cw``) and requests a cleaning machine.

    It then starts the washing process, waits for it to finish and
    leaves to never come back ...

    """
    with cw.machine.request() as request:
        yield request
        yield env.process(cw.wash(name))


def setup(env, num_machines, washtime, t_inter):
    """Create a carwash, a number of initial cars and keep creating cars
    approx. every ``t_inter`` minutes."""
    # Create the carwash
    carwash = Carwash(env, num_machines, washtime)

    # Create 4 initial cars
    for i in range(4):
        env.process(car(env, 'Car %d' % i, carwash))

    # Create more cars while the simulation is running
    while True:
        yield env.timeout(random.randint(t_inter - 5, t_inter + 5))
        i += 1
        env.i = i
        env.process(car(env, 'Car %d' % i, carwash))


# additional wrapping function to be executed by the pool
def do_simulation_with_seed(rs):

    random.seed(rs)  # This influences only the specific process being run
    env = simpy.Environment()  # THE ENVIRONMENT IS CREATED HERE, IN THE CHILD PROCESS
    env.process(setup(env, NUM_MACHINES, WASHTIME, T_INTER))

    env.run(until=SIM_TIME)

    return env.i


if __name__ == '__main__':
    seeds = range(4)
    carwash_pool = mp.Pool(4)
    ncars_by_seed = carwash_pool.map(do_simulation_with_seed, seeds)
    for s, ncars in zip(seeds, ncars_by_seed):
        print('seed={} --> {} cars washed'.format(s, ncars))

Lester Jack
  • 179
  • 8
  • What you **cannot** do easily is fork an existing Environment mid-flight with deepcopy or pickle for the purpose of tweaking its parameters and doing different scenarios without re-running everything, ideally in parallel. See here: https://stackoverflow.com/questions/58415397/what-is-the-easiest-way-to-copy-a-class-instance-that-contains-simpy-processes – Lester Jack Jan 17 '20 at 10:19