16

Is there a simple way to use Multiprocessing to do the equivalent of this?

for sim in sim_list:
  sim.run()

where the elements of sim_list are "simulation" objects and run() is a method of the simulation class which does modify the attributes of the objects. E.g.:

class simulation:
    def __init__(self):
        self.state['done']=False
        self.cmd="program"
    def run(self):
        subprocess.call(self.cmd)
        self.state['done']=True

All the sim in sim_list are independent, so the strategy does not have to be thread safe.

I tried the following, which is obviously flawed because the argument is passed by deepcopy and is not modified in-place.

from multiprocessing import Process

for sim in sim_list:
  b = Process(target=simulation.run, args=[sim])
  b.start()
  b.join()
mece1390
  • 161
  • 2
  • 13
calys
  • 371
  • 1
  • 5
  • 14
  • 4
    You don't want to join() to your processes in the loop, or you will run them one after the other instead of in parallel. To answer your question, you could send a multiprocessing.Queue object when starting the Process and then put self in the queue when done. – Fredrik Håård Apr 03 '13 at 15:18
  • Ok for comment about join(). Regarding the use of Queue, I am not sure how this is supposed to work. Aren't my sim object anyway going to be passed through deepcopy? – calys Apr 03 '13 at 15:58
  • @calys On windows you will get a `PicklingError` because you are trying to pickle a method, on UNIX there is no "deepcopy", simply each process obtains a perfect copy of the whole address space. You have to replace the change of state in the instance by some explicit interprocess communication. – Bakuriu Apr 03 '13 at 16:01
  • @Bakuriu Thanks. I wasn't aware that Process would use pickle. I'll look into interprocess communications. – calys Apr 03 '13 at 16:31
  • @Bakuriu I cannot find a way to make interprocess communications work without having to define another run() function (where I manually put each of the attributes of the simulation class into a result_queue, passed as an argument to Process()). This is not elegant at all and is very error prone if the simulation class is large. – calys Apr 03 '13 at 21:17

2 Answers2

15

One way to do what you want is to have your computing class (simulation in your case) be a subclass of Process. When initialized properly, instances of this class will run in separate processes and you can set off a group of them from a list just like you wanted.

Here's an example, building on what you wrote above:

import multiprocessing
import os
import random

class simulation(multiprocessing.Process):
    def __init__(self, name):
        # must call this before anything else
        multiprocessing.Process.__init__(self)

        # then any other initialization
        self.name = name
        self.number = 0.0
        sys.stdout.write('[%s] created: %f\n' % (self.name, self.number))

    def run(self):
        sys.stdout.write('[%s] running ...  process id: %s\n' 
                         % (self.name, os.getpid()))

        self.number = random.uniform(0.0, 10.0)
        sys.stdout.write('[%s] completed: %f\n' % (self.name, self.number))

Then just make a list of objects and start each one with a loop:

sim_list = []
sim_list.append(simulation('foo'))
sim_list.append(simulation('bar'))

for sim in sim_list:
    sim.start()

When you run this you should see each object run in its own process. Don't forget to call Process.__init__(self) as the very first thing in your class initialization, before anything else.

Obviously I've not included any interprocess communication in this example; you'll have to add that if your situation requires it (it wasn't clear from your question whether you needed it or not).

This approach works well for me, and I'm not aware of any drawbacks. If anyone knows of hidden dangers which I've overlooked, please let me know.

I hope this helps.

DMH
  • 3,875
  • 2
  • 26
  • 25
  • Let's say `simulation` had another method to modify the instance variables inside it, how would you access that method after you've started the sim? – Francis May 15 '19 at 11:10
2

For those who will be working with large data sets, an iterable would be your solution here:

import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())
pool.imap(sim.start, sim_list)
Alec Gerona
  • 2,806
  • 1
  • 24
  • 24