3

I'm working on a project which uses this C++ matplotlib wrapper matplotlibcpp.h.

A minimal example using this original header file is

    #include "matplotlibcpp.h"

    namespace plt = matplotlibcpp;

    int main() {
        plt::plot({1,3,2,4});
        plt::show();
    }

Note: It seems that the segementation fault doesn't depend on the above example but really shows up for any program that calls a function in the mathplotlibcpp.h header file. I chose this plotting example because the actual plotting will work, you will see the plot but once you close it and the program finishes up, you'll get the segmentation fault. Furthermore it is one of the official examples on the projects github page.

You could also replace both lines in the main function with e.g. plt::figure() and you'd still get a working program and a segementation fault at the very end of the execution.

Compiling it with python2.7 seems to work fine

g++ minimal.cpp -std=c++11 -I/usr/include/python2.7 -I/home/<user>/.local/lib/python2.7/site-packages/numpy/core/include/ -lpython2.7

$ ldd a.out 
    linux-vdso.so.1 (0x00007ffe1f3f7000)
    libpython2.7.so.1.0 => /usr/lib/libpython2.7.so.1.0 (0x00007f8320f8f000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f8320db2000)
    libm.so.6 => /usr/lib/libm.so.6 (0x00007f8320c6d000)
    libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f8320c53000)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007f8320a86000)
    libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f8320a65000)
    libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f8320a5c000)
    libutil.so.1 => /usr/lib/libutil.so.1 (0x00007f8320a57000)
    /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f83211c2000)

Compiling it with python3.9 seems to result in a segmentation fault

g++ minimal.cpp -std=c++11 -I/usr/include/python3.9 -I/home/pascal/.local/lib/python3.9/site-packages/numpy/core/include/ -lpython3.9

here ./a.out results in Segmentation fault (core dumped)

$ ldd a.out 
    linux-vdso.so.1 (0x00007fff8dbc5000)
    libpython3.9.so.1.0 => /usr/lib/libpython3.9.so.1.0 (0x00007f60176ec000)
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f601750f000)
    libm.so.6 => /usr/lib/libm.so.6 (0x00007f60173ca000)
    libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f60173b0000)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007f60171e3000)
    libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f60171c2000)
    libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f60171b9000)
    libutil.so.1 => /usr/lib/libutil.so.1 (0x00007f60171b4000)
    /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f6017adf000)

Both are compiled on a system using arch linux with g++ version 10.2.0.

This is an issue found in their git but so far, no one came up with a solution.

Now I isolated the problem to being the call to Py_Finalize(). For Python3 this calls Py_FinalizeEx(). So there's the difference between Python2 and Python3.

Now in the matplotlibcpp.h file Py_Finalize() is called in the deconstructor:

~_interpreter() {
    Py_Finalize();
}

and if you comment it out, you get rid of the segmentation fault. Now I'm honestly confused by this finalizing function because the docs state (for python3)

Bugs and caveats: The destruction of modules and objects in modules is done in random order; this may cause destructors (del() methods) to fail when they depend on other objects (even functions) or modules. Dynamically loaded extension modules loaded by Python are not unloaded. Small amounts of memory allocated by the Python interpreter may not be freed (if you find a leak, please report it). Memory tied up in circular references between objects is not freed. Some memory allocated by extension modules may not be freed. Some extensions may not work properly if their initialization routine is called more than once; this can happen if an application calls Py_Initialize() and Py_FinalizeEx() more than once.

Now there's also a Kill() function in the header file which calls the deconstructor expliclity but it is never used.

Now, it seems that the deconstructor only gets called when we get out of scope i.e. they never use free() or delete. And I think it just try to free something that's already freed but figuring it out is kind of hard because I'm so unfamiliar with the C Python API.

The stack trace: (I hope I installed the python debug symbols correctly. Not sure why the Qt5 widgets symbols don't show.)

Note: I compiled the below stacktrace with -std=c++17 -Wall -g

Also note that the function matplotlibcpp::detail::_interpreter::interkeeper(bool) calls the deconstructor explicitly, see kill(). I mentioned that because this function is mentioned in the stacktrace below - I'm not sure why though. The source code for that function has the following comment:

/* 
    For now, _interpreter is implemented as a singleton since its currently not possible to have
   multiple independent embedded python interpreters without patching the python source code
   or starting a separate process for each. [1]
   Furthermore, many python objects expect that they are destructed in the same thread as they
   were constructed. [2] So for advanced usage, a `kill()` function is provided so that library
   users can manually ensure that the interpreter is constructed and destroyed within the
   same thread.
     1: http://bytes.com/topic/python/answers/793370-multiple-independent-python-interpreters-c-c-program
     2: https://github.com/lava/matplotlib-cpp/pull/202#issue-436220256
   */

Stacktrace:

Thread 1 "MAIN" received signal SIGSEGV, Segmentation fault.
0x00007fffde884225 in ?? () from /usr/lib/libQt5Widgets.so.5
(gdb) bt
#0  0x00007fffde884225 in ?? () from /usr/lib/libQt5Widgets.so.5
#1  0x00007fffdf14540a in ?? () from /usr/lib/python3.9/site-packages/PyQt5/QtWidgets.abi3.so
#2  0x00007fffe2bc67eb in ?? () from /usr/lib/python3.9/site-packages/PyQt5/QtCore.abi3.so
#3  0x00007ffff7d0ea5c in cfunction_vectorcall_NOARGS (func=0x7fffe2cccb80, args=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/methodobject.c:485
#4  0x00007ffff7e0ca69 in atexit_callfuncs (module=<optimized out>) at ./Modules/atexitmodule.c:93
#5  0x00007ffff7c744e7 in call_py_exitfuncs (tstate=0x555555597240) at Python/pylifecycle.c:2374
#6  0x00007ffff7dfc627 in Py_FinalizeEx () at Python/pylifecycle.c:1373
#7  0x000055555555926d in matplotlibcpp::detail::_interpreter::~_interpreter (this=0x55555555e620 <matplotlibcpp::detail::_interpreter::interkeeper(bool)::ctx>, 
    __in_chrg=<optimized out>) at /home/pascal/test/cpp/foo/matplotlibcpp.h:288
#8  0x00007ffff76d24a7 in __run_exit_handlers () from /usr/lib/libc.so.6
#9  0x00007ffff76d264e in exit () from /usr/lib/libc.so.6
#10 0x00007ffff76bab2c in __libc_start_main () from /usr/lib/libc.so.6
#11 0x000055555555646e in _start ()
xotix
  • 494
  • 1
  • 13
  • 41
  • "Note: The code example above doesn't really matter for the segementation fault. " -- that's a bad way to start a question. Care to provide a [mcve] then? In any case, please run `ldd` on the executable, just to rule out mixes between different Python binaries in your code. – Ulrich Eckhardt May 14 '21 at 11:34
  • 2
    What I meant is: Any code that makes at least one call to the given header file results in an segmentation fault at the end of the lifecycle. I chose a plotting example because it shows that the runtime works until you close the plot, at which we are at the end of the program. I did that to show "the seg fault really seems to come at the very end". Let me rephrase it and add ldd. – xotix May 14 '21 at 11:41
  • If you can run your code with gdb and add a stacktrace of the crash to your question it would be helpful. (Run `bt` inside gdb after the segfault to get the stacktrace. You might need to download debug symbols for python to get function names) – unddoch May 16 '21 at 14:15
  • @unddoch Added a stacktrace. I had to compile python3.8.5 with debug flags on s.t. I get the debug symbols. I hope everything is correct now. – xotix May 17 '21 at 13:50

1 Answers1

4

I don't have an easy access to a Linux where I can test it, but I think I now understand what's happening.

  1. matplotlibcpp uses a static variable to hold the Python interpreter (see line 129 inside interkeeper(bool should_kill)). Like C++ static function variables, it's initialized on the first time the function is called and destructed on program's exit (reference).

  2. When main finishes, libc runs cleanup routines for all the shared libraries and for your program (that's __run_exit_handlers in the stacktrace). Since your program is a C++ program, part of its exit handler is destructing all the static variables that were used. One of them is the Python interpreter. Its destructor calls Py_Finalize() which is Python's cleanup routine. Until now, everything's fine.

  3. Python has a similar atexit mechanism that allows Python code from everywhere to register functions that should be called during the interpreter shutdown. Apparently, the backend matplotlib chose to use here is PyQt5. It seems to register such atexit callbacks.

  4. PyQt5's callback gets called, and crashes. Notice that this is internal PyQt5 code now. Why does this crash? My "educated" guess is that Qt's library exit handler was already called in step 2, before your program's exit handler was called. This apparently causes some weird state in the library (maybe some objects were freed?) and crashes.

This leaves two interesting questions:

  1. How to fix this? The solution should be to destruct ctx before your program exits, so the Python interpreter is destructed before any shared libraries terminate themselves. Static lifetimes are known for causing similar problems. If changing matplotlibcpp's interface to not use global static states is not a possible solution, I think you really have to manually call plt::detail::_interpreter::kill() at the end of your main function. You should be able to use atexit() and register a callback that kills the interpreter before the library teardown - I haven't tested it though.

  2. Why did this ever work? My guess is that maybe something in PyQt5's callbacks has changed that now causes this crash, or that you use a different backend in Python 2. If no other library is destructively terminating before the program exits, this is fine.

unddoch
  • 5,790
  • 1
  • 24
  • 37
  • Thanks for pointing me into the direction of Qt being the real issue. I checked what backend matplotlib for python2 and python3 uses. Python2 uses `TkAgg` whereas python3 uses `Qt5Agg`. Now matplotlibcpp.h supports changing the background. Adding `plt::backend("TkAgg");` to my minimal.cpp (being the first call!) fixed the problem. If I understand correctly, this isn't the backends issue right? – xotix May 18 '21 at 11:47
  • No - this is matplotlibcpp's fault for not calling Py_Finalize() before exit() – unddoch May 18 '21 at 16:03
  • So if I understand you correctly, you are saying that the python interpreter (or whatever exaclty Py_Initialize() initializes) gets freed before the constructor of matplotlibcpp gets called? If so, I can't really see how to solve that and why it happens in the first place. Why doesn't the compiler "see" that we created that object and thus we should free it? – xotix May 18 '21 at 17:26
  • I can't really see why the cleanup routine would try to cleanup the python interpreter before calling the deconstructor `~_interpreter()` – xotix May 18 '21 at 17:31
  • So - what gets cleaned up is actually the shared libraries loaded into your process. This is something that libc does for you automatically: https://stackoverflow.com/questions/2053029/how-exactly-does-attribute-constructor-work – unddoch May 18 '21 at 17:33
  • Since `~_interpreter` is called only when the object is freed, and since the object has static lifetime, this is only done on process exit. However, during the process exit, the shared libraries have already been destructed. – unddoch May 18 '21 at 17:36
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/232566/discussion-between-xotix-and-unddoch). – xotix May 18 '21 at 17:39