How to implement a data-parallelising (multiprocessing.Pool) decorator in python?

Question

Problem description

TL;DR

How to implement decorators not breaking the pickle process?

Goal

multiprocessing.Pool can be used to chunk data and distribute it to processes for data-parallelising a given function. I'd like to use such an approach within a decorator for an user-friendly data-parallelisation. The decorator would typically look like the following:

from multiprocessing import Pool
from functools import partial, wraps

def deco_data_parallel(func):
    @wraps(func)
    def to_parallel(arg, **kwargs):
        part_func = partial(func, **kwargs)
        tot = 0
        with Pool() as p:
            for output in p.imap_unordered(part_func, arg):
                tot += output
        return tot
    return to_parallel

The above implementation imposes the following conditions on the function to be parallelized. These limitations can very likely be overcome with a better design.

arg is an iterable to be split into chunks
The fix arguments must be called as keyword arguments

Here is an example of the intended use:

@deco_data_parallel
def compute(data, arg1, arg2):
    return data + arg1 + arg2

if __name__ == "__main__":
    # Dummy data
    data = [4]*100000

    # Fix arguments must be used as keyword arguments
    compute(data, arg1=1, arg2=2)

Error

The function fed to imap_unordered must be pickable. The decorator seems to break the pickability of the original function:

_pickle.PicklingError: Can't pickle <function compute at 0x1040137a0>: it's not the same object as __main__.compute

Best solution

I first thought that @wraps was the problem: if the decorated function is identical to the original function, then the latter can't be found by the pools. But it turns out that the @wraps decorator doesn't have any effect.

Thanks to that great post, I could come up with the following non-optimal solution: create a new top-level function object by using the decorator explicitly as follows. It partially breaks the user-friendliness, and is therefore not satisfying. It nevertheless fulfils the expected purpose.

# Beware: the names should not be the same
compute_ = deco_data_parallel(compute)

if __name__ == "__main__":
    ...
    compute_(data, arg1=1, arg2=2)

Questions

How to solve the pickability problem in an elegant way so that the user can simply decorate the function to be parallelised?
Why doesn't the @functools.wraps decorator have any effect?
What does the error it's not the same object as __main__.compute actually mean? I.e. in what sense exactly am I breaking the pickling process?

My configuration

Macport's Python 3.7.7 on OSX 10.14.6

Disclaimer

I'm quite new to the world of parallel computing in python, as well as in the world of python decorators. Heresy is very likely to have happened in this post and I apologise for that!

That's also my second question on StackOverflow, any proposition of improvement is welcome.

Detailed investigation

Convinced to make it work, I tried multiple decorating strategies. This very complete post on the decorators matter was a great guide. This blog post gave me some hope that an object-oriented decorating strategy could make things work: the author indeed claims that it fixed his/her pickability problem.

All of the following approaches have been tested, with and without @wraps, and they all lead to the same _pickle.PicklingError error. I begin to have the feeling that I tried all the non-hacky possibilities python has to offer, and that would be a great pleasure to be proven wrong!

Functional approach

The most simple approach is the one I showed above. For decorators with arguments, a "decorator factory" can be used as well. Let's use the number of processors here for the sake of the example.

def factory_data_parallel(nproc=4):
    def deco_data_parallel(func):
        @wraps(func)
        def to_parallel(arg, **kwargs):
            part_func = partial(func, **kwargs)
            tot = 0
            with Pool(nproc) as p:
                for output in p.imap_unordered(part_func, arg):
                    tot += output
            return tot
        return to_parallel
    return deco_data_parallel

# Usage: only with the argument (or parenthesis at least)
@factory_data_parallel(8)
def compute(data, arg1, arg2):
    ...

A hybrid form which can be used as a simple decorator or a decorator factory would be implemented as follows:

def factorydeco_data_parallel(_func=None, *, nproc=4):
    def deco_data_parallel(func):
        ...

    if _func is None:
        return deco_data_parallel
    else:
        return deco_data_parallel(_func)

# Usage as a factory (with argument)
@factorydeco_data_parallel(8)
def compute(data, arg1, arg2):
    ...

# Usage as a simple decorator
@factorydeco_data_parallel
def other_compute(data, arg1, arg2):
    ...

Object-oriented approach

From my understanding, a decorator can be any callable object. A simple decorator using an object can be implemented as follow. The first version is called with parenthesis (explicit creation of the object when decorating), and the second one is used as a standard decorator.

class Class_data_parallel(object):
    def __call__(self, func):
        self.orig_func = func

        @wraps(func)
        def to_parallel(arg, **kwargs):
            # Does it make a difference to use the argument func instead?
            part_func = partial(self.orig_func, **kwargs)
            tot = 0
            with Pool() as p:
                for output in p.imap_unordered(part_func, arg):
                    tot += output
            return tot
    
        return to_parallel

class Class_data_parallel_alt(object):
    def __init__(self, func):
        self.orig_func = func

    # PB: no way I'm aware of to use @wraps
    def __call__(self, arg, **kwargs):
        part_func = partial(self.orig_func, **kwargs)
        tot = 0
        with Pool() as p:
            for output in p.imap_unordered(part_func, arg):
                tot += output
        return tot

# Usage: with parenthesis
@Class_data_parallel()
def compute(data, arg1, arg2):
    ...

# Usage: without parenthesis
@Class_data_parallel_alt
def other_compute(data, arg1, arg2):
    ...

An obvious extension of the first case could enable to add some parameters to the constructor. The class would then play the role of a decorator factory.

Some more thinking

As I mentioned, @wraps was candidate for both being the cause and the solution to the problem. Using it or not doesn't change anything
The use of parallel for handling the constant arguments (i.e. constant across processes, arg1 and arg2 in my examples) could be a problem, but I doubt it. I could use the initializer argument of the Pool() constructor.
A. Sherman and P. Den Hartog did achieve that goal in their DECO parallel model. I'm however not able to understand how they overcame my problem. It seems to prove that what I want to do is not a fundamental limitation of decorators.

score 0 · Answer 1 · answered Mar 28 '20 at 23:06

0

You're trying to have the workers execute partial(func, **kwargs), where func is the undecorated function. For this to work, the workers have to be able to find func by module and qualified name, but it's not where the name suggests it should be. The to_parallel wrapper is there instead. This gets detected when the master process tries to pickle func.

I wouldn't use a decorator for this. The options I see for a decorator all interfere with something else, like documentation generation or decorator composability.

answered Mar 28 '20 at 23:06

user2357112

260,549
28
431
505

Are you suggesting that there is no way to achieve what I want with decorators? What is the fundamental limitation? I thought that using a class would enable to keep track of the original function and call it explicitly instead of the wrapper returned by the decorator. But a class attribute is not pickable... – jmon12 Mar 29 '20 at 08:46
What I want to do seems possible. I edited my post to add a link to A. Sherman's and P. Den Hartog's [DECO](https://github.com/alex-sherman/deco): they achieved what I'd like to do, but I'm not able to understand how. – jmon12 Mar 29 '20 at 08:51
@jmon12: They use a global dictionary mapping function names to functions, and a dispatcher that looks up functions in this dictionary. Their implementation breaks if you try to give two `concurrent`-decorated functions the same name, even in completely unrelated modules. That part's simple enough to fix - use module and qualname instead of just name - but there are other limitations, like the very restrictive limits on what you can do in a `synchronized`-decorated function. – user2357112 Mar 29 '20 at 09:14
DECO was a class project, and it really shows. It's buggy and so limited that it doesn't really offer any advantages over using `multiprocessing` directly. – user2357112 Mar 29 '20 at 09:42
I didn't try it out, but the goal is interesting. How to use multiprocessing in python in a non invasive way? It's quite seducing to imagine that the user only needs to add a decorator for that purpose. – jmon12 Mar 29 '20 at 09:56