Apply a method to a list of objects in parallel using multi-processing

Question

I have created a class with a number of methods. One of the methods is very time consuming, my_process, and I'd like to do that method in parallel. I came across Python Multiprocessing - apply class method to a list of objects but I'm not sure how to apply it to my problem, and what effect it will have on the other methods of my class.

class MyClass():
    def __init__(self, input):
        self.input = input
        self.result = int

    def my_process(self, multiply_by, add_to):
        self.result = self.input * multiply_by
        self._my_sub_process(add_to)
        return self.result

    def _my_sub_process(self, add_to):
        self.result += add_to

list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
list_of_results = [obj.my_process(100, 1) for obj in list_of_objects] # multi-process this for-loop

print list_of_numbers
print list_of_results

[0, 1, 2, 3, 4]
[1, 101, 201, 301, 401]

Tim Peters · Accepted Answer · 2017-04-08T18:28:54.037

I'm going to go against the grain here, and suggest sticking to the simplest thing that could possibly work ;-) That is, Pool.map()-like functions are ideal for this, but are restricted to passing a single argument. Rather than make heroic efforts to worm around that, simply write a helper function that only needs a single argument: a tuple. Then it's all easy and clear.

Here's a complete program taking that approach, which prints what you want under Python 2, and regardless of OS:

class MyClass():
    def __init__(self, input):
        self.input = input
        self.result = int

    def my_process(self, multiply_by, add_to):
        self.result = self.input * multiply_by
        self._my_sub_process(add_to)
        return self.result

    def _my_sub_process(self, add_to):
        self.result += add_to

import multiprocessing as mp
NUM_CORE = 4  # set to the number of cores you want to use

def worker(arg):
    obj, m, a = arg
    return obj.my_process(m, a)

if __name__ == "__main__":
    list_of_numbers = range(0, 5)
    list_of_objects = [MyClass(i) for i in list_of_numbers]

    pool = mp.Pool(NUM_CORE)
    list_of_results = pool.map(worker, ((obj, 100, 1) for obj in list_of_objects))
    pool.close()
    pool.join()

    print list_of_numbers
    print list_of_results

A big of magic

I should note there are many advantages to taking the very simple approach I suggest. Beyond that it "just works" on Pythons 2 and 3, requires no changes to your classes, and is easy to understand, it also plays nice with all of the Pool methods.

However, if you have multiple methods you want to run in parallel, it can get a bit annoying to write a tiny worker function for each. So here's a tiny bit of "magic" to worm around that. Change worker() like so:

def worker(arg):
    obj, methname = arg[:2]
    return getattr(obj, methname)(*arg[2:])

Now a single worker function suffices for any number of methods, with any number of arguments. In your specific case, just change one line to match:

list_of_results = pool.map(worker, ((obj, "my_process", 100, 1) for obj in list_of_objects))

More-or-less obvious generalizations can also cater to methods with keyword arguments. But, in real life, I usually stick to the original suggestion. At some point catering to generalizations does more harm than good. Then again, I like obvious things ;-)

Wouldn't it use however many cores are available if NUM_CORE isn't set? — bluprince13, Apr 05 '17 at 19:47
Sure. That's up to you. However, for a CPU-bound task, it's typical to ask for _fewer_ cores than actually exist, so the OS gets some cycles to run other stuff too. But, again, that's up to you. `mp.cpu_count()` returns the number of cores that exist. — Tim Peters, Apr 05 '17 at 19:56
@TimPeters For me this code is slower than the non-parallel version (just a for loop). Do you have any benchmarks? — Hirak Sarkar, May 05 '22 at 19:27
The work done per item in this specific code is trivial - no form of multiprocessing can speed it. The overheads of interprocess communications swamp the benefits when work units are so tiny. This answer wasn't aimed at speeding anything, but at illustrating a simple, general _approach_. — Tim Peters, May 05 '22 at 20:28

score 2 · Answer 2 · answered Mar 24 '17 at 15:48

If your class is not "huge", I think process oriented is better. Pool in multiprocessing is suggested.
This is the tutorial -> https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

Then seperate the add_to from my_process since they are quick and you can wait util the end of the last process.

def my_process(input, multiby):
    return xxxx
def add_to(result,a_list):
    xxx
p = Pool(5)
res = []
for i in range(10):
    res.append(p.apply_async(my_process, (i,5)))
p.join()  # wait for the end of the last process
for i in range(10):
    print res[i].get()

score 2 · Answer 3 · edited May 23 '17 at 10:30

Generally the easiest way to run the same calculation in parallel is the map method of a multiprocessing.Pool (or the as_completed function from concurrent.futures in Python 3).

However, the map method applies a function that only takes one argument to an iterable of data using multiple processes.

So this function cannot be a normal method, because that requires at least two arguments; it must also include self! It could be a staticmethod, however. See also this answer for a more in-depth explanation.

score 1 · Answer 4 · edited May 23 '17 at 12:26

Based on the answer of Python Multiprocessing - apply class method to a list of objects and your code:

add MyClass object into simulation object

class simulation(multiprocessing.Process):
    def __init__(self, id, worker, *args, **kwargs):
        # must call this before anything else
        multiprocessing.Process.__init__(self)
        self.id = id
        self.worker = worker
        self.args = args
        self.kwargs = kwargs
        sys.stdout.write('[%d] created\n' % (self.id))

run what you want in run function

    def run(self):
        sys.stdout.write('[%d] running ...  process id: %s\n' % (self.id, os.getpid()))
        self.worker.my_process(*self.args, **self.kwargs)
        sys.stdout.write('[%d] completed\n' % (self.id))

Try this:

list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]
list_of_sim = [simulation(id=k, worker=obj, multiply_by=100*k, add_to=10*k) \
    for k, obj in enumerate(list_of_objects)]  

for sim in list_of_sim:
    sim.start()

score 0 · Answer 5 · answered Apr 02 '17 at 09:57

If you don't absolutely need to stick with Multiprocessing module then, it can easily achieved using concurrents.futures library

here's the example code:

from concurrent.futures.thread import ThreadPoolExecutor, wait

MAX_WORKERS = 20

class MyClass():
    def __init__(self, input):
        self.input = input
        self.result = int

    def my_process(self, multiply_by, add_to):
        self.result = self.input * multiply_by
        self._my_sub_process(add_to)
        return self.result

    def _my_sub_process(self, add_to):
        self.result += add_to

list_of_numbers = range(0, 5)
list_of_objects = [MyClass(i) for i in list_of_numbers]

With ThreadPoolExecutor(MAX_WORKERS) as executor:
    for obj in list_of_objects:
        executor.submit(obj.my_process, 100, 1).add_done_callback(on_finish)

def on_finish(future):
    result = future.result() # do stuff with your result

here executor returns future for every task it submits. keep in mind that if you use add_done_callback() finished task from thread returns to the main thread (which would block your main thread) if you really want true parallelism then you should wait for future objects separately. here's the code snippet for that.

futures = []
with ThreadPoolExecutor(MAX_WORKERS) as executor:
    for objin list_of_objects:
        futures.append(executor.submit(obj.my_process, 100, 1))
wait(futures)

for succeded, failed in futures:
    # work with your result here
    if succeded:
       print (succeeeded.result())
    if failed:
        print (failed.result())

hope this helps.

Keep in mind that on the standard implementation of Python only one thread at a time can be executing Python bytecode. On this implementation threads will not improve CPU-bound performance. — Roland Smith, Apr 02 '17 at 10:18
For CPU-bound tasks you just need to swap the `ThreadPoolExecutor` for a `ProcessPoolExecutor`. You'll take a small hit as the processes start up, but after that the workers can execute at the same time. Note that you the data that is returned from the sub-processes needs to be pickle-able. — Brad Campbell, Apr 03 '17 at 20:23

Apply a method to a list of objects in parallel using multi-processing

5 Answers5

A big of magic

Linked