I have a project running on Python 3.8 x64 windows, with an anaconda environment.
It uses an inhouse C++ DLL x64, with a python wrapping using ctypes (the C++ code cannot be shared due to various reasons; the most important reason is that I do not have access to 99% of the code myself ...). The project works perfectly fine with Python 3.8 x64.
The main C++ DLL entry point takes a string path to a JSON file for initialization. The ctypes binding code looks like this:
class RedactedName:
def __init__(some_args): # redacted
# some code
self.dll = ctypes.CDLL(self.dll_path)
self.dll.load_config_file.argtypes = [ctypes.c_char_p]
self.dll.load_config_file.restype = ctypes.c_bool
# some code
def load_config_file(self, file_path: str) -> bool:
self.conf_file_path = file_path # path of a JSON file, the C++ will load it
s = ctypes.c_char_p(file_path.encode('utf-8'))
b = self.dll.load_config_file(s) # <-- SEG FAULT HERE with py3.6.6 only. ok with py3.8
return bool(b)
The corresponding C++ code : (namely, the 1% part I am 100% sure of)
bool load_config_file(/* in */ const char * input_json) {
json conf;
const std::string & json_file_path_str(input_json);
REDACTED_FUNCTION_CALL_A(/* in */ json_file_path_str, /* out */ conf); // <-- seg fault in here, depending on python version, see GDB call stack
// other stuff
return true; // no joke
}
For various reasons, I need to downgrade the Python version to Python 3.6.6 x64 (with pip only; if that matters).
With Python 3.6.6 x64, the C++ call self.dll.load_config_file
goes into SIGSEGV
.
The C++ DLL call stack looks like this (luckily the DLL contains some symbols).
# (with python breakpoint right before DLL load)
$ attach <ptyhon process PID>
$ catch load mylibname
$ catch throw
$ c
# now, resume python execution
# when gdb breaks on dll load, add breakpoint
$ b load_config_file
exception catched
(gdb) n
Single stepping until exit from function
_ZN3REDACTED5REDACTED8REDACTED4REDACTEDKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERN8nlohmann10basic_jsonISt3mapSt6vectorS7_bxydSaNSA_14adl_serializerEEE,
which has no line number information.
Thread 1 received signal SIGSEGV, Segmentation fault.
0x000000006fcd2194 in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/ext/atomicity.h:82
82 /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/ext/atomicity.h: No such file or directory.
backtrace :
#0 0x000000006fcd2194 in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/ext/atomicity.h:82
#1 __gnu_cxx::__exchange_and_add_dispatch (__val=-1, __mem=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/ext/atomicity.h:82
#2 __gnu_cxx::__exchange_and_add_dispatch (__val=-1, __mem=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/ext/atomicity.h:78
#3 std::locale::_Impl::_M_remove_reference (this=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/bits/locale_classes.h:564
#4 std::locale::operator= (this=this@entry=0xbc8f3ecf50, __other=
@0xbc8f3ec9f8: {static none = 0, static ctype = 1, static numeric = 2, static collate = 4, static time = 8, static monetary = 16, static messages = 32, static all = 63, _M_impl = 0x6fcfc6a0 <(anonymous namespace)::c_locale_impl>, static _S_classic = 0x6fcfc6a0 <(anonymous namespace)::c_locale_impl>, static _S_global = 0x6fcfc6a0 <(anonymous namespace)::c_locale_impl>, static _S_categories = 0x6fd06aa0 <__gnu_cxx::category_names>, static _S_once = {done = 1, started = 0}, static _S_twinned_facets = 0x6fd09140 <std::locale::_S_twinned_facets>}) at ../../../../../src/libstdc++-v3/src/c++98/locale.cc:116
#5 0x000000006fce906c in std::ios_base::_M_init (this=this@entry=0xbc8f3ece80) at ../../../../../src/libstdc++-v3/src/c++98/ios_locale.cc:44
#6 0x000000006fceaf91 in std::basic_ios<char, std::char_traits<char> >::init (this=0xbc8f3ece80, __sb=0x0) at /it/sbx/20.1/x86_64-windows/gcc/build/x86_64-pc-mingw32/libstdc++-v3/include/bits/basic_ios.tcc:126
#7 0x000000006ce0b7fb in REDACTED_FUNCTION_CALL_A(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, nlohmann::basic_json<std::map, std::vector, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, long long, unsigned long long, double, std::allocator, nlohmann::adl_serializer>&) ()
from C:\Users\<redacted>/<redacted>_lib.dll
#8 0x000000006cdc15c6 in load_config_file (input_json=0x27eee2877a0 "C:\\Users\\<a_valid_path_to_avalid_json_file>") at <redacted>
#9 0x0000000062880943 in _ctypes!DllCanUnloadNow () from C:\Program Files\Python36\DLLs\_ctypes.pyd
Additional informations:
- The python calls are purely sequential, there is no threading, so the
s
variable cannot be garbage collected before the end of theself.dll.load_config_file
call. - The C++ DLL has not been recompiled. It is the exact same as in the Py3.8 environment. I can switch back and forth from py3.6 to 3.8 envs. 3.6 fails, 3.8 works.
- the C++ DLL depends on
libwinpthread-1.dll
,libgcc_s_seh-1.dll
,libstdc++-6.dll
- DLL was built with GNAT PRO
g++ (GCC) 8.3.1 20190923 (for GNAT Pro 20.1)
- DLL compilation directive contains: "-std=c++11", "-O0"
- DLL link directive contains "-static-libgcc"
Any clue of why py3.6.6 gives this weird cpp call stack & SIGSEGV and py3.8 not ?
If the py3.8 version wasn't working, I would suppose it is a duplicate of undefined reference to `__gnu_cxx::__exchange_and_add(int*, int)'. But it is not the case ...
Hypothesis:
std::basic_ios<char, std::char_traits<char> >::init
seems related to stream buffer initialization. Could it be some conflicts of std output stream ? very unlikely- python3.6 messing with env vars ? very unlikely
- python3.6 messing with DLLs, creating dependancy conflicts ? unlikely
As far as I can tell, these questions are NOT related because function return type is bool and accurate with source code. And py3.8 works.
Update 2021-11-04 1
~~ I have found a potential root cause.~~
I used catch load my_lib
and catch load libstdc++
to be sure which libs were loaded.
When using the Py3.8 anaconda environment, my C++ DLL is loaded (first catch), and then it loads libstdc++-6.dll that is stored right next to it (second catch) (can't tell the version, thats the magic of black box binary deliveries ...).
When using the pip venv Py3.6, my C++ DLL is loaded (first catch), and then loads libstdc++-6.dll coming from my GNAT install (second catch).
So far, this may be the major difference I can spot. I am pretty sure the 2 libstdc++ have different version.
I do not understand why library loading precedence differs between the two versions.
Content of python os.environ
output is slightly different between the two envs, mainly due to the env var LIBRARY_ROOTS
pointing to the respective python install lib folders (In both cases PATH
contains the path to the gnat pro install bin dir).
Update 2021-11-04 2
When trying different libstdc++-6.dll
substitutions to ensure both python env version use the exact same one from the same path, it appears the libstdc++-6.dll
version has no effect on the observed behaviour.
It looks like that the Python version changes the DLL execution.
(C++ snipped updated to reflex the char* usage)