86

I am attempting to use a partial function so that pool.map() can target a function that has more than one parameter (in this case a Lock() object).

Here is example code (taken from an answer to a previous question of mine):

from functools import partial

def target(lock, iterable_item):
    for item in items:
        # Do cool stuff
        if (... some condition here ...):
            lock.acquire()
            # Write to stdout or logfile, etc.
            lock.release()

def main():
    iterable = [1, 2, 3, 4, 5]
    pool = multiprocessing.Pool()
    l = multiprocessing.Lock()
    func = partial(target, l)
    pool.map(func, iterable)
    pool.close()
    pool.join()

However when I run this code, I get the error:

Runtime Error: Lock objects should only be shared between processes through inheritance.

What am I missing here? How can I share the lock between my subprocesses?

DJMcCarthy12
  • 3,819
  • 8
  • 28
  • 34
  • There's another question about this same issue, though their particular error is different - [Trouble using a lock with multiprocessing.Pool: pickling error](http://stackoverflow.com/questions/17960296/trouble-using-a-lock-with-multiprocessing-pool-pickling-error) – Rob Watts Aug 28 '14 at 21:03

2 Answers2

157

You can't pass normal multiprocessing.Lock objects to Pool methods, because they can't be pickled. There are two ways to get around this. One is to create Manager() and pass a Manager.Lock():

def main():
    iterable = [1, 2, 3, 4, 5]
    pool = multiprocessing.Pool()
    m = multiprocessing.Manager()
    l = m.Lock()
    func = partial(target, l)
    pool.map(func, iterable)
    pool.close()
    pool.join()

This is a little bit heavyweight, though; using a Manager requires spawning another process to host the Manager server. And all calls to acquire/release the lock have to be sent to that server via IPC.

The other option is to pass the regular multiprocessing.Lock() at Pool creation time, using the initializer kwarg. This will make your lock instance global in all the child workers:

def target(iterable_item):
    for item in items:
        # Do cool stuff
        if (... some condition here ...):
            lock.acquire()
            # Write to stdout or logfile, etc.
            lock.release()
def init(l):
    global lock
    lock = l

def main():
    iterable = [1, 2, 3, 4, 5]
    l = multiprocessing.Lock()
    pool = multiprocessing.Pool(initializer=init, initargs=(l,))
    pool.map(target, iterable)
    pool.close()
    pool.join()

The second solution has the side-effect of no longer requiring partial.

Shaido
  • 27,497
  • 23
  • 70
  • 73
dano
  • 91,354
  • 19
  • 222
  • 219
  • Well thank you again, sir. This looks like exactly what I need. Really appreciate the continued help! Other options looked super involved. I will go with the initializer function to share the global Lock. – DJMcCarthy12 Aug 28 '14 at 21:47
  • This worked great. I also put a `Queue` into the init to save passing that in each call. – fantabolous Aug 17 '15 at 15:11
  • 2
    @dano thanks a lot for your answer, I too had the same query and this one solves it perfectly, however I have another query that why is this approach not used frequently to share state between processes rather than doing so via a Manager object which has its own overhead of running a server processes and proxied access ? – bawejakunal Feb 22 '16 at 17:00
  • 7
    @bawejakunal `multiprocessing.Lock` is a process-safe object, so you can pass it directly to child processes and safely use it across all of them. However, most mutable Python objects (like `list`, `dict`, most user-created classes) are *not* process safe, so passing them between processes leads to completely distinct copies of the objects being created in each process. In those cases, you need to use `multiprocessing.Manager`. – dano Feb 22 '16 at 20:37
  • 1
    @dano thanks, so I have a `list of locks` and a `list of browser objects` that I need to share as global variables among processes, how would you suggest doing that ? I tried using `Manager` object but that gives an error about non-serializability of `thread.lock` while returning `Manager.Lock()` objects from the `list of locks`. Sample code here: https://codeshare.io/dnhhw – bawejakunal Feb 23 '16 at 03:31
  • 2
    Could you explain why Lock needs to be created as global variable when using Pool, but can be passed as argument when using Process? – lineil Mar 15 '17 at 21:56
  • 11
    @neilxdims With a `Process`, the Lock is inherited by the child process as it is forked, and passed directly into whatever method you passed as the target of the Process. With a `Pool`, when you pass arguments to methods like `map` and `apply`, the processes are already forked, so inheritance won't work. However the arguments to the `Pool`'s `initializer` function are inherited, so the `Lock` can be passed. But, for the methods the pool processes execute to have access to the `Lock`, it needs to be global; otherwise it will be inaccessible outside the `initializer` function. – dano Mar 17 '17 at 01:04
  • 5
    Why can't you simply declare lock as global and access it from within each pool function? – pistacchio Oct 26 '17 at 06:26
  • 6
    @pistacchio On Linux platforms, you can do that. On Windows, you can't, because Windows doesn't support forking. You'll end up with a different lock object in each child process. – dano Oct 26 '17 at 13:02
  • For anyone requiring related sample code for `joblib`'s `Parallel`, see this link: https://github.com/davidheryanto/etc/blob/master/python-recipes/parallel-joblib-counter.py – Dan Nissenbaum Jan 13 '18 at 22:03
  • Should you pass the logger to your processes via Pool's initializer as well? – Jay Sep 08 '20 at 14:14
  • 1
    When using the second approach, it might be worth explicitly calling `multiprocessing.set_start_method('fork')` to make it more explicit that sharing locks/barriers/etc without a Manager process is relying on 'nix systems defaulting to fork, especially as the macOS default start method changed in Python 3.8: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods – ncoghlan May 11 '22 at 07:30
  • 1
    Alternative option to force forking (or an error if it isn't available) that doesn't have a global side effect: `mp_ctx = multiprocessing.get_context('fork')`. Then use the returned multiprocessing context to create the `Pool` and `Lock`/`Barrier` objects rather than the top level multiprocessing module methods. – ncoghlan May 13 '22 at 00:51
2

Here's a version (using Barrier instead of Lock, but you get the idea) which would also work on Windows (where the missing fork is causing additional troubles):

import multiprocessing as mp

def procs(uid_barrier):
    uid, barrier = uid_barrier
    print(uid, 'waiting')
    barrier.wait()
    print(uid, 'past barrier')    

def main():
    N_PROCS = 10
    with mp.Manager() as man:
        barrier = man.Barrier(N_PROCS)
        with mp.Pool(N_PROCS) as p:
            p.map(procs, ((uid, barrier) for uid in range(N_PROCS)))

if __name__ == '__main__':
    mp.freeze_support()
    main()
Tom Pohl
  • 2,711
  • 3
  • 27
  • 34