27

In Python the multiprocessing module can be used to run a function over a range of values in parallel. For example, this produces a list of the first 100000 evaluations of f.

def f(i):
    return i * i

def main():
    import multiprocessing
    pool = multiprocessing.Pool(2)
    ans = pool.map(f, range(100000))

    return ans

Can a similar thing be done when f takes multiple inputs but only one variable is varied? For example, how would you parallelize this:

def f(i, n):
    return i * i + 2*n

def main():
    ans = []
    for i in range(100000):
        ans.append(f(i, 20))

    return ans
martineau
  • 119,623
  • 25
  • 170
  • 301
Mark Bell
  • 880
  • 1
  • 12
  • 22

5 Answers5

45

You can use functools.partial()

def f(i, n):
    return i * i + 2*n

def main():
    import multiprocessing
    pool = multiprocessing.Pool(2)
    ans = pool.map(functools.partial(f, n=20), range(100000))

    return ans
martineau
  • 119,623
  • 25
  • 170
  • 301
mouad
  • 67,571
  • 18
  • 114
  • 106
  • 6
    I know that this is allowed, but why, given that only functions defined at the module top level may be pickled? – BallpointBen May 07 '18 at 17:05
  • Can you clarify moment about using partial - looks like it ignores keys for argument: if I want to pool.map on SECOND argument - `partial(f, i=20)` - I got error: got multiple values for argument `i`. – Mikhail_Sam Jun 21 '19 at 08:35
  • 2
    @Mikhail_Sam https://docs.python.org/2/library/functools.html#functools.partial The function you are adding to the partial needs to have the first argument as the positional argument (like 'i' when running for loop) and the remaining keyword arguments should come after that. All the values of 'i' are added as a list/range as the second argument to the 'pool.map' function. In your example, you have provided a value of 'i' within the partial function when the values for 'i' are already available as the second argument of 'pool' function, leading you to the self explanatory error/ – skdhfgeq2134 May 12 '20 at 04:39
15

There are several ways to do this. In the example given in the question, you could just define a wrapper function

def g(i):
    return f(i, 20)

and pass this wrapper to map(). A more general approach is to have a wrapper that takes a single tuple argument and unpacks the tuple to multiple arguments

def g(tup):
    return f(*tup)

or use a equivalent lambda expression: lambda tup: f(*tup).

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
8

If you use my fork of multiprocessing, called pathos, you can get pools that take multiple arguments… and also take lambda functions. The nice thing about it is that you don't have to alter your programming constructs to fit working in parallel.

>>> def f(i, n):
...   return i * i + 2*n
... 
>>> from itertools import repeat
>>> N = 10000
>>>
>>> from pathos.pools import ProcessPool as Pool
>>> pool = Pool()
>>>
>>> ans = pool.map(f, xrange(1000), repeat(20))
>>> ans[:10]
[40, 41, 44, 49, 56, 65, 76, 89, 104, 121]
>>>
>>> # this also works
>>> ans = pool.map(lambda x: f(x, 20), xrange(1000))
>>> ans[:10]
[40, 41, 44, 49, 56, 65, 76, 89, 104, 121]
Mike McKerns
  • 33,715
  • 8
  • 119
  • 139
  • Just installed pathos - much nicer being able to use local functions with closures etc. without any global partials or wrapper funcs or anything else. Thanks for this. – Alex L Oct 15 '21 at 22:45
  • 1
    @AlexL: note that if you want exactly the same interface as `multiprocessing` but with better serialization, you can alternately use `multiprocess` (`pathos` installs it as a dependency). – Mike McKerns Oct 16 '21 at 17:44
4

This technique is know as Currying: https://en.wikipedia.org/wiki/Currying

Another way to do it without using functools.partial using the classical map command inside pool.map:

def f(args):
   x, fixed = args
   # FUNCTIONALITY HERE

pool = multiprocessing.Pool(multiprocessing.cpu_count() - 1)
pool.map(f, map(lambda x: (x, fixed), arguments))
bolzano
  • 816
  • 2
  • 13
  • 30
-4

You can use poor man's currying (aka wrap it):

new_f = lambda x: f(x, 20)

then call new_f(i).

nmichaels
  • 49,466
  • 12
  • 107
  • 135
  • 5
    Thils will *not* work with multiprocessing's map, because that doesn't support functions that aren't "importable" (using the pickle tool) – cadolphs May 22 '13 at 19:30