0

I have a project running on Python 3.8 x64 windows, with an anaconda environment.

It uses an inhouse C++ DLL x64, with a python wrapping using ctypes (the C++ code cannot be shared due to various reasons; the most important reason is that I do not have access to 99% of the code myself ...). The project works perfectly fine with Python 3.8 x64.

The main C++ DLL entry point takes a string path to a JSON file for initialization. The ctypes binding code looks like this:

class RedactedName:

    def __init__(some_args): # redacted
        # some code
        self.dll = ctypes.CDLL(self.dll_path)
        self.dll.load_config_file.argtypes = [ctypes.c_char_p]
        self.dll.load_config_file.restype = ctypes.c_bool
        # some code

    def load_config_file(self, file_path: str) -> bool:
        self.conf_file_path = file_path # path of a JSON file, the C++ will load it
        s = ctypes.c_char_p(file_path.encode('utf-8'))
        b = self.dll.load_config_file(s) # <-- SEG FAULT HERE with py3.6.6 only. ok with py3.8
        return bool(b)

The corresponding C++ code : (namely, the 1% part I am 100% sure of)

bool load_config_file(/* in */ const char * input_json) {
    json conf;
    const std::string & json_file_path_str(input_json);
    REDACTED_FUNCTION_CALL_A(/* in */ json_file_path_str, /* out */ conf); // <-- seg fault in here, depending on python version, see GDB call stack
    // other stuff
    return true; // no joke
}

For various reasons, I need to downgrade the Python version to Python 3.6.6 x64 (with pip only; if that matters).

With Python 3.6.6 x64, the C++ call self.dll.load_config_file goes into SIGSEGV. The C++ DLL call stack looks like this (luckily the DLL contains some symbols).

# (with python breakpoint right before DLL load)
$ attach <ptyhon process PID>
$ catch load mylibname
$ catch throw
$ c 
# now, resume python execution
# when gdb breaks on dll load, add breakpoint
$ b load_config_file

exception catched

(gdb) n
Single stepping until exit from function
 
 
 
 _ZN3REDACTED5REDACTED8REDACTED4REDACTEDKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERN8nlohmann10basic_jsonISt3mapSt6vectorS7_bxydSaNSA_14adl_serializerEEE,
    which has no line number information.

    Thread 1 received signal SIGSEGV, Segmentation fault.
    0x000000006fcd2194 in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/ext/atomicity.h:82
    82      /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/ext/atomicity.h: No such file or directory.

backtrace :

#0  0x000000006fcd2194 in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/ext/atomicity.h:82

#1  __gnu_cxx::__exchange_and_add_dispatch (__val=-1, __mem=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/ext/atomicity.h:82

#2  __gnu_cxx::__exchange_and_add_dispatch (__val=-1, __mem=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/ext/atomicity.h:78

#3  std::locale::_Impl::_M_remove_reference (this=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/bits/locale_classes.h:564

#4  std::locale::operator= (this=this@entry=0xbc8f3ecf50, __other=
      @0xbc8f3ec9f8: {static none = 0, static ctype = 1, static numeric = 2, static collate = 4, static time = 8, static monetary = 16, static messages = 32, static all = 63, _M_impl = 0x6fcfc6a0 <(anonymous namespace)::c_locale_impl>, static _S_classic = 0x6fcfc6a0 <(anonymous namespace)::c_locale_impl>, static _S_global = 0x6fcfc6a0 <(anonymous namespace)::c_locale_impl>, static _S_categories = 0x6fd06aa0 <__gnu_cxx::category_names>, static _S_once = {done = 1, started = 0}, static _S_twinned_facets = 0x6fd09140 <std::locale::_S_twinned_facets>}) at ../../../../../src/libstdc++-v3/src/c++98/locale.cc:116

#5  0x000000006fce906c in std::ios_base::_M_init (this=this@entry=0xbc8f3ece80) at ../../../../../src/libstdc++-v3/src/c++98/ios_locale.cc:44

#6  0x000000006fceaf91 in std::basic_ios<char, std::char_traits<char> >::init (this=0xbc8f3ece80, __sb=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/bits/basic_ios.tcc:126

#7  0x000000006ce0b7fb in REDACTED_FUNCTION_CALL_A(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nlohmann::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long long, unsigned long long, double, std::allocator, nlohmann::adl_serializer>&) ()
   from C:\Users\<redacted>/<redacted>_lib.dll

#8  0x000000006cdc15c6 in load_config_file (input_json=0x27eee2877a0 "C:\\Users\\<a_valid_path_to_avalid_json_file>") at <redacted>

#9  0x0000000062880943 in _ctypes!DllCanUnloadNow () from C:\Program Files\Python36\DLLs\_ctypes.pyd

Additional informations:

  • The python calls are purely sequential, there is no threading, so the s variable cannot be garbage collected before the end of the self.dll.load_config_file call.
  • The C++ DLL has not been recompiled. It is the exact same as in the Py3.8 environment. I can switch back and forth from py3.6 to 3.8 envs. 3.6 fails, 3.8 works.
  • the C++ DLL depends on libwinpthread-1.dll, libgcc_s_seh-1.dll, libstdc++-6.dll
  • DLL was built with GNAT PRO g++ (GCC) 8.3.1 20190923 (for GNAT Pro 20.1)
  • DLL compilation directive contains: "-std=c++11", "-O0"
  • DLL link directive contains "-static-libgcc"

Any clue of why py3.6.6 gives this weird cpp call stack & SIGSEGV and py3.8 not ?

If the py3.8 version wasn't working, I would suppose it is a duplicate of undefined reference to `__gnu_cxx::__exchange_and_add(int*, int)'. But it is not the case ...

Hypothesis:

  • std::basic_ios<char, std::char_traits<char> >::init seems related to stream buffer initialization. Could it be some conflicts of std output stream ? very unlikely
  • python3.6 messing with env vars ? very unlikely
  • python3.6 messing with DLLs, creating dependancy conflicts ? unlikely

As far as I can tell, these questions are NOT related because function return type is bool and accurate with source code. And py3.8 works.

Python changelogs 3.6 3.7 3.8

Update 2021-11-04 1

~~ I have found a potential root cause.~~

I used catch load my_lib and catch load libstdc++ to be sure which libs were loaded.

When using the Py3.8 anaconda environment, my C++ DLL is loaded (first catch), and then it loads libstdc++-6.dll that is stored right next to it (second catch) (can't tell the version, thats the magic of black box binary deliveries ...).

When using the pip venv Py3.6, my C++ DLL is loaded (first catch), and then loads libstdc++-6.dll coming from my GNAT install (second catch).

So far, this may be the major difference I can spot. I am pretty sure the 2 libstdc++ have different version.

I do not understand why library loading precedence differs between the two versions.

Content of python os.environ output is slightly different between the two envs, mainly due to the env var LIBRARY_ROOTS pointing to the respective python install lib folders (In both cases PATH contains the path to the gnat pro install bin dir).

Update 2021-11-04 2

When trying different libstdc++-6.dll substitutions to ensure both python env version use the exact same one from the same path, it appears the libstdc++-6.dll version has no effect on the observed behaviour. It looks like that the Python version changes the DLL execution. (C++ snipped updated to reflex the char* usage)

LoneWanderer
  • 3,058
  • 1
  • 23
  • 41
  • ctypes can fail in strange ways since it involves hacking together a binary interface; is there a reason you aren't using CFFI which lets the compiler do its job? – o11c Nov 03 '21 at 17:54
  • If I understand correctly, CFFI is an alternative to using ctypes ? TBH, I was not aware of alternatives to ctypes. https://cffi.readthedocs.io/en/latest/goals.html states > There is no C++ support. Sometimes, it is reasonable to write a C wrapper around the C++ code and then call this C API with CFFI. Otherwise, look at other projects. I would recommend cppyy, which has got some similarities (and also works efficiently on both CPython and PyPy). Since my DLL is supposed to be C compatible, I could give it a shot. – LoneWanderer Nov 03 '21 at 17:58
  • 2
    The problem is in the c++ code. Without knowing the code, there is no point in speculating - this this question isn’t a good fit for SO. – ead Nov 03 '21 at 18:01
  • @ead I felt that too, but you never know. Sometimes some old pal with infinite black magic knowledge enters the inn and shares a few bits of wisdom ;). From the DLL interfacing perspective, as far as I can tell, the only thing that changes is the python version. – LoneWanderer Nov 03 '21 at 18:02
  • 1
    It's *self.dll.load\_config\_file.argtype**s***. Check [\[SO\]: C function called from Python via ctypes returns incorrect value (@CristiFati's answer)](https://stackoverflow.com/questions/58610333/c-function-called-from-python-via-ctypes-returns-incorrect-value/58611011#58611011) for more details. If it's the same problem, please mark this question as a duplicate. – CristiFati Nov 03 '21 at 18:59
  • Fixing the typo on argtype-s- doesn't change the observations. Ctypes may have some built-in robustness to this. – LoneWanderer Nov 03 '21 at 19:14
  • No it doesn't. As explained in the other answer, this is *Undefined Behavior*, so the fact it works, is because you just got "lucky". Another thing that comes into my mind, on *Win* *Python* is built with *VStudio*. Although it's a long shot, you might want to build your *.dll* with it as well (hmm kind of hard without access to the code :) ). – CristiFati Nov 03 '21 at 19:29
  • Can you share the *.dll*? – CristiFati Nov 03 '21 at 19:51
  • That is not possible, and I have no rights to install the MS VStudio (unless I go for the install request corporate process and bureaucracy). Using GDB I am sure that the passed argument string seems valid and the same regardless of the python version. Something makes the c++ execution completely different. I should re-check string termination (\0) on both sides (even if I am pretty sure ctypes.c_char_p does the job on the python side for later std::string conversion on the cpp side) – LoneWanderer Nov 03 '21 at 20:35
  • There is nothing wrong with the Python code other than the `.argtypes` mis-spelling, but since you wrap the parameter in `c_char_p()` anyway it doesn't really matter. More likely the unshared C code has undefined behavior that happens to work in one scenario and not the other. Does the input_json pointer get saved without copying the data? Once `s` goes out-of-scope that pointer would be invalid. – Mark Tolonen Nov 03 '21 at 20:49
  • I have found a potential root cause. I used `catch load my_lib` and `catch load libstdc++` to be sure which libs were loaded. For reasons I can not explain, when using the Py3.8 anaconda environment, my C++ DLL is loaded (first catch), and then it loads libstdc++-6.dll that is stored right next to it (second catch) (can't tell the version, thats the magic of black box binary deliveries ...). But when using the pip venv Py3.6, my C++ DLL is loaded (first catch), and then loads libstdc++-6.dll coming from my GNAT install (second catch). So far, this may be the major difference I can spot. – LoneWanderer Nov 04 '21 at 10:28
  • Hmmm, I wonder if it's not the same thing as https://stackoverflow.com/questions/58631512/pywin32-and-python-3-8-0 (how did I missed that???). Try `os.add_dll_directory(${YOUR_GNAT_LIB_PATH})` before loading the *.dll*. – CristiFati Nov 05 '21 at 15:16
  • Nice catch, I read the patch notes from py 3.6 to 3.7 related to ctypes. but i gave up doing so for 3.7 to 3.8. I'm gonna try playing with his as soon as I can. I keep you informed. – LoneWanderer Nov 05 '21 at 15:50

0 Answers0