2

I have attempted in a few different ways to perform Pool.starmap. I have tried various different suggestions and answers, and to no avail. Below is a sample of the code I am trying to run, however it gets caught and never terminates. What am I doing wrong here?

Side note: I am on python version 3.9.8

if __name__ == '__main__':
with get_context("spawn").Pool() as p:
    tasks = [(1,1),(2,2),(3,3)]
    print(p.starmap(add,tasks))
    p.close()
    p.join()
jahantaila
  • 840
  • 5
  • 26
rly
  • 21
  • 2
  • 1
    When using the `with` context for `Pool`, you don't need to `close` or `join` the pool. That's what the `with` context does for you. – Aaron Nov 11 '21 at 04:17

1 Answers1

1

Multiprocessing in python has some complexity you should be aware of that make it dependent on how you run your script in addition to what OS, and python version you're using.

One of the big issues I see very often is the fact that Jupyter and other "notebook" style python environments don't always play nice with multiprocessing. There are technically some ways around this, but I typically just suggest executing your code from a more normal system terminal. The common thread is "interactive" interpreters don't work very well because there needs to be a "main" file, and in interactive mode there's no file; it just waits for user input.

I can't know exactly what your issue is here, as you haven't provided all your code, what OS you're using, and what IDE you're using but I can at least leave you with a working (on my setup) example. (windows 10; python 3.9; Spyder IDE with run settings -> execute in an external system terminal)

import multiprocessing as mp

def add(a, b): #I'm assuming your "add" function looks a bit like this...
    return a+b

if __name__ == "__main__": 
    #this is critical when using "spawn" so code doesn't run when the file is imported
    #you should only define functions, classes, and static data outside this (constants)
    #most critically, it shouldn't be possible for a new child process to start outside this
    
    ctx = mp.get_context("spawn")
    #This is the only context available on windows, and the default for MacOS since python 3.8.
    #  Contexts are an important topic somewhat unique to python multiprocessing, and you should
    #  absolutely do some additional reading about "spawn" vs "fork". tldr; "spawn" starts a new
    #  process with no knowledge of the old one, and must `import` everything from __main__. 
    #  "fork" on the other hand copies the existing process and all its memory before branching. This is
    #  faster than re-starting the interpreter, and re-importing everything, but sometimes things
    #  get copied that shouldn't, and other things that should get copied don't.
    with ctx.Pool() as p: 
        #using `with` automatically shuts down the pool (forcibly) at the end of the block so you don't have to call `close` or `join`.
        #  It was also pointed out that due to the forcible shutdown, async calls like `map_async` may not finish unless you wait for the results
        #  before the end of the `with` block. `starmap` already waits for the results in this case however, so extra waiting is not needed.
        tasks = [(1,1),(2,2),(3,3)]
        print(p.starmap(add, tasks))
Aaron
  • 10,133
  • 1
  • 24
  • 40
  • 1
    One small point: exiting the `with` block results in just a call to `p.terminate()`, so if there are any tasks running or schedule to run, they will be killed. – Booboo Nov 13 '21 at 19:59
  • @Booboo TIL...Thanks! `starmap` in this case is synchronous however, so all processing should be done anyway. This is important to know with any of the async functions of `Pool` however. It tracks with the practice that you should always `get` results from a queue before `join`ing a process (or pool in this case). – Aaron Nov 13 '21 at 20:26