Python multiprocessing with start method 'spawn' doesn't work

Question

I wrote a Python class to plot pylots in parallel. It works fine on Linux where the default start method is fork but when I tried it on Windows I ran into problems (which can be reproduced on Linux using the spawn start method - see code below). I always end up getting this error:

Traceback (most recent call last):
  File "test.py", line 50, in <module>
    test()
  File "test.py", line 7, in test
    asyncPlotter.saveLinePlotVec3("test")
  File "test.py", line 41, in saveLinePlotVec3
    args=(test, ))
  File "test.py", line 34, in process
    p.start()
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle weakref objects

C:\Python\MonteCarloTools>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\spawn.py", line 99, in spawn_main
    new_handle = reduction.steal_handle(parent_pid, pipe_handle)
  File "C:\Users\adrian\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 82, in steal_handle
    _winapi.PROCESS_DUP_HANDLE, False, source_pid)
OSError: [WinError 87] The parameter is incorrect

I hope there is a way to make this code work for Windows. Here a link to the different start methods available on Linux and Windows: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods

import multiprocessing as mp
def test():

    manager = mp.Manager()
    asyncPlotter = AsyncPlotter(manager.Value('i', 0))

    asyncPlotter.saveLinePlotVec3("test")
    asyncPlotter.saveLinePlotVec3("test")

    asyncPlotter.join()


class AsyncPlotter():

    def __init__(self, nc, processes=mp.cpu_count()):

        self.nc = nc
        self.pids = []
        self.processes = processes


    def linePlotVec3(self, nc, processes, test):

        self.waitOnPool(nc, processes)

        print(test)

        nc.value -= 1


    def waitOnPool(self, nc, processes):

        while nc.value >= processes:
            time.sleep(0.1)
        nc.value += 1


    def process(self, target, args):

        ctx = mp.get_context('spawn') 
        p = ctx.Process(target=target, args=args)
        p.start()
        self.pids.append(p)


    def saveLinePlotVec3(self, test):

        self.process(target=self.linePlotVec3,
                       args=(self.nc, self.processes, test))


    def join(self):
        for p in self.pids:
            p.join()


if __name__=='__main__':
    test()

Zach Thompson · Accepted Answer · 2019-07-26T01:53:57.713

When using the spawn start method, the Process object itself is being pickled for use in the child process. In your code, the target=target argument is a bound method of AsyncPlotter. It looks like the entire asyncPlotter instance must also be pickled for that to work, and that includes self.manager, which apparently doesn't want to be pickled.

In short, keep Manager outside of AsyncPlotter. This works on my macOS system:

def test():
    manager = mp.Manager()
    asyncPlotter = AsyncPlotter(manager.Value('i', 0))
    ...

Also, as noted in your comment, asyncPlotter did not work when reused. I don't know the details but looks like it has something to do with how the Value object is shared across processes. The test function would need to be like:

def test():
    manager = mp.Manager()
    nc = manager.Value('i', 0)

    asyncPlotter1 = AsyncPlotter(nc)
    asyncPlotter1.saveLinePlotVec3("test 1")
    asyncPlotter2 = AsyncPlotter(nc)
    asyncPlotter2.saveLinePlotVec3("test 2")

    asyncPlotter1.join()
    asyncPlotter2.join()

All in all, you might want to restructure your code and use a process pool. It already handles what AsyncPlotter is doing with cpu_count and parallel execution:

from multiprocessing import Pool, set_start_method
from random import random
import time

def linePlotVec3(test):
    time.sleep(random())
    print("test", test)

if __name__ == "__main__":
    set_start_method("spawn")
    with Pool() as pool:
        pool.map(linePlotVec3, range(20))

Or you could use a ProcessPoolExecutor to do pretty much the same thing. This example starts tasks one at a time instead of mapping to a list:

from concurrent.futures import ProcessPoolExecutor
import multiprocessing as mp
import time
from random import random

def work(i):
    r = random()
    print("work", i, r)
    time.sleep(r)

def main():
    ctx = mp.get_context("spawn")
    with ProcessPoolExecutor(mp_context=ctx) as pool:
        for i in range(20):
            pool.submit(work, i)

if __name__ == "__main__":
    main()

For a second it looked like you saved my day, but unfortunately your solution works only when I call ```asyncPlotter.saveLinePlotVec3("test")``` once. If I call it twice in a row I get the same error again. I updated my code with your proposed changes adding the second call of ```asyncPlotter``` which still fails :/ — Adrian, Jul 25 '19 at 02:57
@Adrian I edited my answer with some new code and a suggestion for another approach. It seems that wrapping the multiprocessing code in a class is causing some rather murky issues with `multiprocessing`. HTH — Zach Thompson, Jul 25 '19 at 11:38
thanks for your help. I made it work by refactoring my code to use ```pool.map()```, it's not as nice because I have to collect all my input for all the plots I want to generate first in a list first and can't call the plotting function on the fly, but at least the code executes 10 times faster now :) — Adrian, Jul 25 '19 at 22:34
I posted another example that should be more flexible for your case. — Zach Thompson, Jul 26 '19 at 02:05

score 1 · Answer 2 · edited Aug 21 '20 at 02:03

1

For portability, all objects passed as arguments to a function that will be run in a process must be picklable.

edited Aug 21 '20 at 02:03

Asclepius

57,944
17
167
143

answered Jul 24 '19 at 22:00

Roman Miroshnychenko

1,496
1
10
16

I pass only a string here to the function that runs in a process. So what is the problem here? And why does it work with the fork start method and not with spawn? – Adrian Jul 24 '19 at 22:03

Python multiprocessing with start method 'spawn' doesn't work

2 Answers2

Linked