95

How do I catch a Ctrl+C in multiprocess python program and exit all processes gracefully, I need the solution to work both on unix and windows. I've tried the following:

import multiprocessing
import time
import signal
import sys

jobs = []

def worker():
    signal.signal(signal.SIGINT, signal_handler)
    while(True):
        time.sleep(1.1234)
        print "Working..."

def signal_handler(signal, frame):
    print 'You pressed Ctrl+C!'
    # for p in jobs:
    #     p.terminate()
    sys.exit(0)

if __name__ == "__main__":
    for i in range(50):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

And it's kind of working, but I don't think it's the right solution.

pppery
  • 3,731
  • 22
  • 33
  • 46
zenpoy
  • 19,490
  • 9
  • 60
  • 87

3 Answers3

96

The previously accepted solution has race conditions and it does not work with map and async functions.


The correct way to handle Ctrl+C/SIGINT with multiprocessing.Pool is to:

  1. Make the process ignore SIGINT before a process Pool is created. This way created child processes inherit SIGINT handler.
  2. Restore the original SIGINT handler in the parent process after a Pool has been created.
  3. Use map_async and apply_async instead of blocking map and apply.
  4. Wait on the results with timeout because the default blocking waits to ignore all signals. This is Python bug https://bugs.python.org/issue8296.

Putting it together:

#!/bin/env python
from __future__ import print_function

import multiprocessing
import os
import signal
import time

def run_worker(delay):
    print("In a worker process", os.getpid())
    time.sleep(delay)

def main():
    print("Initializng 2 workers")
    original_sigint_handler = signal.signal(signal.SIGINT, signal.SIG_IGN)
    pool = multiprocessing.Pool(2)
    signal.signal(signal.SIGINT, original_sigint_handler)
    try:
        print("Starting 2 jobs of 5 seconds each")
        res = pool.map_async(run_worker, [5, 5])
        print("Waiting for results")
        res.get(60) # Without the timeout this blocking call ignores all signals.
    except KeyboardInterrupt:
        print("Caught KeyboardInterrupt, terminating workers")
        pool.terminate()
    else:
        print("Normal termination")
        pool.close()
    pool.join()

if __name__ == "__main__":
    main()

As @YakovShklarov noted, there is a window of time between ignoring the signal and unignoring it in the parent process, during which the signal can be lost. Using pthread_sigmask instead to temporarily block the delivery of the signal in the parent process would prevent the signal from being lost, however, it is not available in Python-2.

Muhammad Usman Bashir
  • 1,441
  • 2
  • 14
  • 43
Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • This answer works in Windows with Python 2.7.11 though a SIGINT throws a long traceback. In 3.5.2 this answer does not work - SIGINT never properly exits. Instead you get `KeyboardInterrupt` and `Process SpawnPoolWorker` messages and tracebacks from the multiprocessing module and you have to kill the spawned processes manually. The accepted answer doesn't exhibit this behavior. – MartyMacGyver Jul 08 '16 at 09:34
  • @MartyMacGyver Only tested this with Python 2.7.x. – Maxim Egorushkin Aug 28 '16 at 05:38
  • Also works on Linux (Ubuntu 16.04) with Python 2.7.11. This really should be the accepted answer. – theV0ID Sep 30 '16 at 10:00
  • 1
    seems like you have to use map_async, not map, can anyone allude as to the difference in single handling? (calling .get on the map_async result didn't seem necessary either) – ThorSummoner Feb 17 '17 at 01:06
  • This answer will ignore a SIGINT which is sent between the two calls to signal(). Besides, are you certain that subprocesses inherit SIG_IGN? Perhaps you could use signal.pthread_sigmask() to block and unblock SIGINT instead, and set SIG_IGN from a pool initializer function. – Yakov Shklarov Mar 08 '17 at 08:26
  • @YakovShklarov `pthread_sigmask` would be ideal, however, this API is absent in Python 2. With regards to signal disposition inheritance - look that up, that would be a smart thing to do. – Maxim Egorushkin Mar 08 '17 at 10:18
  • Oh, you're right. It's implicit in the behavior of fork(). – Yakov Shklarov Mar 08 '17 at 20:21
  • Which one was the accepted solution first mentioned? This is now the accepted solution. – sapht Apr 27 '17 at 14:43
  • 11
    This didn't work for me with Python 3.6.1 on Windows 10, KeyboardInterrupt is not caught – szx Jul 02 '17 at 04:13
  • @szx It was not tested in python 3 and neither in Windows. – Maxim Egorushkin Jul 02 '17 at 10:22
  • Thank you for this. Is there any ways to retrieve something from children when receiving `KeyboardInterrupt` ? – Boop Sep 07 '17 at 19:16
  • 1
    @Boop I am not sure, one would need to investigate that. – Maxim Egorushkin Sep 08 '17 at 00:16
  • 6
    This solution is not portable as it works only on Unix. Moreover, it would not work if the user sets the `maxtasksperchild` Pool parameter. The newly created processes would inherit the standard `SIGINT` handler again. The [pebble](https://pypi.python.org/pypi/Pebble) library disables `SIGINT` by default for the user as soon as the new process is created. – noxdafox Sep 12 '17 at 06:47
  • apparently `join()` is non-blocking because I used `join` before the `get()` **without** delay and it still worked fine! Thanks for the solution – Tomerikoo Jul 07 '19 at 12:29
  • 1
    Note that the blocking calls issue has been resolved in Python 3.3, you can use `map()`, `apply()` and `get()` without a timeout: https://bugs.python.org/issue9205 – Luper Rouch Sep 28 '19 at 08:09
  • @LuperRouch Thanks for the update. I wonder if it was ported to Python 2. – Maxim Egorushkin Sep 28 '19 at 18:04
  • Nope, 2.7.16 still has the issue. – Luper Rouch Sep 29 '19 at 20:15
  • I found this pretty confusing until I realized `original_sigint_handler = signal.signal(signal.SIGINT, signal.SIG_IGN)` should be `original_sigint_handler = signal.getsignal(signal.SIGINT)`. – Sevag Nov 13 '19 at 01:13
  • 1
    Also worth noting that `signal.signal(signal.SIGINT, signal.SIG_IGN)` returns the **previous configured signal**. – Chen A. Mar 22 '21 at 07:08
38

The solution is based on this link and this link and it solved the problem, I had to moved to Pool though:

import multiprocessing
import time
import signal
import sys

def init_worker():
    signal.signal(signal.SIGINT, signal.SIG_IGN)

def worker():
    while(True):
        time.sleep(1.1234)
        print "Working..."

if __name__ == "__main__":
    pool = multiprocessing.Pool(50, init_worker)
    try:
        for i in range(50):
            pool.apply_async(worker)

        time.sleep(10)
        pool.close()
        pool.join()

    except KeyboardInterrupt:
        print "Caught KeyboardInterrupt, terminating workers"
        pool.terminate()
        pool.join()
zenpoy
  • 19,490
  • 9
  • 60
  • 87
  • That's a bit too late: there is a race condition window between `fork()` return in the child process and `signal()` call. The signal must be blocked before forking. – Maxim Egorushkin Jul 03 '12 at 15:55
  • 1
    @MaximYegorushkin - the signal is blocked in `init_worker` which is called before the `apply_async` - is that what you're talking about? – zenpoy Jul 03 '12 at 16:06
  • What I mean is that the signal must be blocked before the child process is forked and unblocked after. This way the child inherits the signal mask and has no chance to receive the signal. – Maxim Egorushkin Jul 03 '12 at 16:19
  • It's not too late at all. Also, init_worker is not called before apply_asyc. What happens, init_worker function (as object) is passed to the Pool object. Then pool.apply_async is called and internally calls the init_worker function right before the child is forked. Not sure though how Pool deals with unblocking the signal. – Chris Koston Jan 21 '15 at 20:14
  • 9
    This only works because of the time.sleep. If you try to `get()` the results of the `map_async` call instead, the interrupt is delayed until processing is complete. – Clément Jun 03 '15 at 17:25
  • 1
    This is a wrong answer. Correct answer: http://stackoverflow.com/a/35134329/412080 – Maxim Egorushkin Feb 01 '16 at 15:59
  • This answer works in Windows with Python 2.7.11 and 3.5.2. – MartyMacGyver Jul 08 '16 at 09:26
  • `Pool` forks the number of child processes running `init_worker` _first_ and once per child process. This occurs _before_ async applying `worker` functions to one (of the pool made available) child processes. This means the SIG_IGN is applied inside each child process and there is no need for pool to deal with unblocking the signal since it won't affect the parent (@ChrisKoston). One can verify with `print(os.getpid())` in both init_worker and worker and note each worker pid will have had one init_worker run beforehand in the process with the same pid. – Stephan Scheller Jan 13 '17 at 05:52
  • 2
    Sure it works. But it's wrong. From the docs: "each worker process will call initializer(*initargs) when it starts." That's "when", not "before". So: a race condition. Here's what can happen: The subprocess is created, but before signal.signal() completes, SIGINT is sent! The subprocess aborts with an uncaught KeyboardInterrupt. This is rare but there are no guarantees it won't happen. (Actually it might not be so rare if you're spawning tons of workers.) If you don't block, the worst thing that could happen would seem to be just crud on your terminal. Still, this is bad practice. – Yakov Shklarov Mar 08 '17 at 08:06
  • If you get problem while been blocked by calling `get()`, try to add `while not p.ready(): time.sleep(10)` before it. – jayatubi Jun 28 '22 at 11:04
15

Just handle KeyboardInterrupt-SystemExit exceptions in your worker process:

def worker():
    while(True):
        try:
            msg = self.msg_queue.get()
        except (KeyboardInterrupt, SystemExit):
            print("Exiting...")
            break
derkan
  • 475
  • 4
  • 5
  • For signals that make Python raise SystemExit, this indeed works, on Python 3.6, too. I wonder though, what signals does that include? I would guess SIGKILL and SIGTERM ...? – Petri Jan 17 '17 at 08:07
  • 1
    You can easily check which signals that includes and the answer is: I think none. SystemExit is only raised by sys.exit according to the docs. Just execute `try: time.sleep(60) except BaseException as e: print(e)` and you'll see if a specific signal is caught (ime only SIGINT). That's what the manpage states, too. – t.animal May 15 '17 at 15:37
  • @Petri It's probably just SIGINT. I believe SIGKILL is uncatchable, and SIGTERM is something else. – dstromberg Jun 08 '20 at 17:33