4

Good news: Run this with python3 test.py and then press ctrl-c. It stops as it should

Bad news: Run this with mpirun -n 1 python3 test.py press ctrl-c. Ops, mpirun gets terminated, but all the python processes spawned by multiprocessing.pool lives on forever. How to fix this?

test.py:

from mpi4py import MPI
import multiprocessing as mp
import signal
import time

class GracefulKiller:
    kill_now = False

    def __init__(self):
        signal.signal(signal.SIGINT, self.exit_gracefully)
        signal.signal(signal.SIGTERM, self.exit_gracefully)
    def exit_gracefully(self, signum, frame):
        self.kill_now = True
        print("I kill")

def worker(e):
    killer = GracefulKiller()
    while(True):
        if killer.kill_now:
            e.set()
        if e.is_set():
            return

def main():
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()

    killer = GracefulKiller()
    with mp.Manager() as manager:
        e = manager.Event()
        pool = mp.Pool()
        arg = []
        for i in range(100):
            arg += [e]
        r = pool.map_async(worker, arg)
        r.get()
        pool.join()
        pool.close()
        if killer.kill_now:
            e.set()
        if e.is_set():
            comm.Abort()

main()

The GracefulKiller is from How to process SIGTERM signal gracefully?

The mpirun is from openmpi. I tested this on Ubuntu and CentOS.


Update:

  • I added the line print("I kill"). Then I try ctrl-C again with mpirun. It prints I kill for once but a bunch of python3 processes still live on.

Update2:

  • Added a name to graceful killer
  • Added pool.terminate() to try to kill all process spawned by main process's pool when the main process catches ctrl-c

test.py:

from mpi4py import MPI
import multiprocessing as mp
import signal
import time
class GracefulKiller:
    kill_now = False

    def __init__(self, name, pool=None):
        signal.signal(signal.SIGINT, self.exit_gracefully)
        signal.signal(signal.SIGTERM, self.exit_gracefully)
        self.name = name
        self.pool = pool
    def exit_gracefully(self, signum, frame):
        self.kill_now = True
        print("I kill.", self.name)
        if self.pool is not None:
            self.pool.close()
            self.pool.terminate()

def worker(e):
    killer = GracefulKiller('worker')
    while(True):
        if killer.kill_now:
            e.set()
        if e.is_set():
            return

def main():
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()

    with mp.Manager() as manager:
        e = manager.Event()
        pool = mp.Pool()

        killer = GracefulKiller('main', pool)
        arg = []
        for i in range(100):
            arg += [e]
        r = pool.map_async(worker, arg)
        r.get()
        pool.join()
        pool.close()
        if killer.kill_now:
            e.set()
        if e.is_set():
            comm.Abort()

main()
  • python3 test.py (and then ctrl-c):

a bunch of python3 processes live on.

I kill. worker/main ...
...
  File "test.py", line 20, in exit_gracefully
    self.pool.terminate()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 505, in terminate
    self._terminate()
  File "/usr/lib/python3.5/multiprocessing/util.py", line 186, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 535, in _terminate_pool
...
I kill. worker/main ...
  • mpirun -n 1 python3 test.py (and then ctrl-c): .

    prints nothing and a bunch of python3 processes live on


Things that might help:

Python multiprocessing: Kill worker on exit

Community
  • 1
  • 1
hamster on wheels
  • 2,771
  • 17
  • 50
  • any suggestion? – hamster on wheels Feb 22 '17 at 20:04
  • 1
    The main process receives the signal, but the children don't; under MPI, they could run on different hosts. I suspect that `comm.Abort` is not the right way to send the termination signal, or that `mpirun` terminates child processes using different signals. [Apparently](http://stackoverflow.com/a/32225536/223424) `SIGINT` and `SIGTERM` differ in this regard, but Ctrl+C sends SIGINT. – 9000 Feb 22 '17 at 20:04
  • you are right. only the main process's graceful killer catches the ctrl-c. The child processes doesn't do a thing. – hamster on wheels Feb 22 '17 at 20:13
  • still can't find a fix yet. – hamster on wheels Feb 22 '17 at 21:16
  • @hamsteronwheels ever figure this out? I'm having the same issue and it's pretty frustrating. – Mike Nov 10 '20 at 21:46

0 Answers0