5

I am using the multiprocessing.Pool class within an object and attempting the following:

from multiprocessing import Lock, Pool

class A:
    def __init__(self):
        self.lock = Lock()
        self.file = open('test.txt')
    def function(self, i):
        self.lock.acquire()
        line = self.file.readline()
        self.lock.release()
        return line
    def anotherfunction(self):
        pool = Pool()
        results = pool.map(self.function, range(10000))
        pool.close()
        pool.join()
        return results

However I am getting a runtime error stating that lock objects should only be shared between processes through inheritance. I am fairly new to Python and multiprocessing. How can I get put on the right track?

Géry Ogam
  • 6,336
  • 4
  • 38
  • 67
Filibuster
  • 51
  • 1
  • 3
  • always put full error message (starting at word "Traceback") in question (not comment) as text (not screenshot, not link to external portal). There are other useful information. – furas Nov 10 '21 at 13:49
  • maybe you should send `self.Lock` to `self.function` as second argument. – furas Nov 10 '21 at 13:50
  • you forgot `self` in functions definitions. - `def function(self)`, `def anotherfunction(self)` – furas Nov 10 '21 at 13:51
  • For furture readers, it is possible to create picklable locks that work with pools, see this answer https://stackoverflow.com/a/75247561/16310741 – Charchit Agarwal Feb 19 '23 at 14:44

1 Answers1

9

multiprocessing.Lock instances can be attributes of multiprocessing.Process instances. When a process is created in the main process with a lock attribute, the lock exists in the main process’s address space. When the process’s start method is invoked and runs a subprocess which invokes the process’s run method, the lock has to be serialized/deserialized to the subprocess address space. This works as expected:

from multiprocessing import Lock, Process

class P(Process):
    def __init__(self, *args, **kwargs):
        Process.__init__(self, *args, **kwargs)
        self.lock = Lock()
    def run(self):
        print(self.lock)

if __name__ == '__main__':
    p = P()
    p.start()
    p.join()

Prints:

<Lock(owner=None)>

Unfortuantely, this does not work when you are dealing with multiprocessing.Pool instances. In your example, self.lock is created in the main process by the __init__ method. But when Pool.map is called to invoke self.function, the lock cannot be serialized/deserialized to the already-running pool process that will be running this method.

The solution is to initialize each pool process with a global variable set to this lock (there is no point in having this lock being an attribute of the class now). The way to do this is to use the initializer and initargs parameters of the pool __init__ method. See the documentation:

from multiprocessing import Lock, Pool

def init_pool_processes(the_lock):
    '''Initialize each process with a global variable lock.
    '''
    global lock
    lock = the_lock

class Test:
    def function(self, i):
        lock.acquire()
        with open('test.txt', 'a') as f:
            print(i, file=f)
        lock.release()
    def anotherfunction(self):
        lock = Lock()
        pool = Pool(initializer=init_pool_processes, initargs=(lock,))
        pool.map(self.function, range(10))
        pool.close()
        pool.join()

if __name__ == '__main__':
    t = Test()
    t.anotherfunction()
Géry Ogam
  • 6,336
  • 4
  • 38
  • 67
Booboo
  • 38,656
  • 3
  • 37
  • 60
  • Thanks, I fixed the first 4 issues you cited. Sorry for the confusion. In your code you listed, why is lock defined as global in init_pool_processes after it is instantiated in anotherfunction()? – Filibuster Nov 10 '21 at 21:09
  • `anotherfunction` is executing in the main process and `function` is a "worker function" that is invoked as a result of the call to `pool.map` by one of the processes in the multiprocessing pool you created and is therefore running in a completely different process/address space. You need to pass the `lock` instance from one address space to another and the only way to do that when dealing with a multiprocessing pool is to initialize for each process in the pool a global variable with the lock. This is how you initialize "by inheritance". Read the documentation at the link I posted. – Booboo Nov 10 '21 at 21:39
  • So is the worker function a static copy of the class method? – Filibuster Nov 12 '21 at 03:14
  • On Windows to launch a new process a new, empty address space is created and a new Python interpreter is launched that re-reads the source file re-creating the class definition and its methods in the new address space before the worker function can be invoked. So if that is what you mean by a "static copy of the class method", then yes. – Booboo Nov 12 '21 at 10:47
  • If this answers your question, feel free to *Accept* the answer. – Booboo Sep 28 '22 at 17:26