1

I have the following test_mpi.py python script:

from mpi4py import MPI
import time

class Foo:
    def __init__(self):
        print('Creation object.')
    def __del__(self):
        print('Object destruction.')

foo = Foo()
time.sleep(10)

If I execute it without recourse to mpiexec, using a simple python test_mpi.py, pressing CTRL+C after 5s, I get the following output:

ngreiner@Nathans-MacBook-Pro:~/Documents/scratch$ python test_mpi.py 
Creation object.
^CTraceback (most recent call last):
  File "test_mpi.py", line 26, in <module>
    time.sleep(10)
KeyboardInterrupt
Object destruction.
ngreiner@Nathans-MacBook-Pro:~/Documents/scratch$

If I embed it within an mpiexec execution, using mpiexec -np 1 python test_mpi.py, again pressing CTRL+C after 5s, I now get:

ngreiner@Nathans-MacBook-Pro:~/Documents/scratch$ mpiexec -np 1 python test_mpi.py 
Creation object.
^Cngreiner@Nathans-MacBook-Pro:~/Documents/scratch$

The traceback from python and the execution of the __del__ method have disappeared. The main problem for me is the non-execution of the __del__ method, which is supposed to make some clean-up in my actual application.

Any idea how I could have the __del__ method executed when the Python execution is launched from mpiexec ?

Thank you very much in advance for the help,

(My system configuration: macOS High sierra 10.13.6, Python 3.7.4, open-mpi 4.0.1, mpi4py 3.0.2.)

ngreiner
  • 61
  • 3

3 Answers3

1

After a bit of search, I found a solution to restore the printing of the traceback and the execution of the __del__ method when hitting ^C during mpiexec.

During a normal python execution (not launched by mpiexec, launched directly from the terminal), hitting ^C sends a SIGINT signal to python, which translates it into a KeyboardInterrupt exception (https://docs.python.org/3.7/library/signal.html).

But when hitting ^C during an mpiexec execution, it is the mpiexec process which receives the SIGINT signal, and instead of propagating it to its children processes (for instance python), it sends to its children processes a SIGTERM signal (https://www.open-mpi.org/doc/current/man1/mpirun.1.php).

It thus seems that python doesn't react similarly to SIGINT and SIGTERM signals.

The workaround I found is to use the signal module, and to use a specific handler for the SIGTERM signal, which simply raises a KeyboardInterrupt. This can be achieved by the following lines:

def sigterm_handler():
    raise KeyboardInterrupt

import signal
signal.signal(signal.SIGTERM, sigterm_handler)

The former can be included at the top of the executed python script, or, to retain this behaviour each time python is used with mpiexec and with the mpi4py package, at the top of the __init__.py file of the mpi4py package.

This strategy may have side-effects (which I am unaware of) and should be used at your own risk.

ngreiner
  • 61
  • 3
0

Per documentation, it is not guaranteed that del would be called. So you are lucky that it is called on non-mpi program.

For simple case, you could use try/finally to be sure that finally section is executed. Or, more generically, use context manager

Here is a quote from documentation that is important here:

It is not guaranteed that del() methods are called for objects that still exist when the interpreter exits.

vav
  • 4,584
  • 2
  • 19
  • 39
  • in my case, it is very unlikely that the object whose \_\_del\_\_ method should be invoked gets caught in a reference cycle. So whenever its reference count drops to 0, it should be destroyed and its \_\_del\_\_ method be invoked (at least in the CPython implementation of Python as far as I understood). Are there other reasons for which the \_\_del\_\_ method may not be executed? – ngreiner Sep 26 '19 at 16:41
  • I added a quote from documentation - it has nothing to do with CPython. It is simply not guaranteed to be called. To free resources, it is recommended to use context managers - this method is more explicit and therefore more "pythonic". – vav Sep 26 '19 at 18:25
  • You could also take a route to handle signals: `https://docs.python.org/3.7/library/signal.html` , so when script is interrupted you get control to free required resources. – vav Sep 26 '19 at 18:28
  • Thank you. I will post a new question to expose my problem more clearly, independently of the mechanisms of the \_\_del\_\_ method. But regarding my first question, you say that it is by luck that \_\_del\_\_ is called in the non-mpi program, but why is it systematically called in the non-mpi program and systematically not-called in the mpi program? And what about the traceback? – ngreiner Sep 27 '19 at 09:34
  • Hm.. when you do interrupt on python process, it is python interpreter who receive the signal. When mpiexec receive interrupt signal, what does it do? Does it send same signal to all child processes? or it simply exit and leave them running? if processes still running, they would finish, but stdout/stderr would not be collected – vav Sep 27 '19 at 13:14
  • After implementing a context manager in my application, the outcome is that I am having the same problem when using the context manager. Hitting ^C during normal python execution prints the traceback and invokes the \_\_exit\_\_ method of the context manager, while hitting ^C during mpiexec suppresses the printing and the method invocation... I could upload a MWE to demonstrate the issue if needed. – ngreiner Sep 27 '19 at 13:17
  • "by luck": At the moment CPython (as a program) over-deliver comparing with what it promise. But next version might behave slightly differently and __del__ might not have called. – vav Sep 27 '19 at 13:18
  • Try to write to file instead of stdout/stderr. Maybe mpi does leave everybody behind. – vav Sep 27 '19 at 13:20
  • From the Open MPI manpage: "When orterun (<=> mpiexec <=> mpirun) receives a SIGTERM and SIGINT, it will attempt to kill the entire job by sending all processes in the job a SIGTERM, waiting a small number of seconds, then sending all processes in the job a SIGKILL." – ngreiner Sep 27 '19 at 13:31
  • Also, as troubleshooting step, you could handle signals in python process and log them when they are called (starting with innocent SIGUSR1) – vav Sep 27 '19 at 13:44
  • Indeed, upon hitting ^C during normal python execution, the signal received by python is SIGINT. Instead, upon hitting ^C during mpiexec execution spawning a python process, the signal received by python is SIGTERM. This is probably why the traceback is not printed and \_\_exit\_\_ or \_\_del\_\_ methods not executed. – ngreiner Sep 27 '19 at 14:12
0

The answer by ngreiner helped me, but at least with Python 2.7 and all Python 3 versions, the handler function needs two arguments. This modified code snippet with dummy arguments worked for me:

import signal

def sigterm_handler(signum, frame):
    raise KeyboardInterrupt

signal.signal(signal.SIGTERM, sigterm_handler)
LSchueler
  • 1,414
  • 12
  • 23