Cython undefined symbol with c wrapper

Question

I am trying to expose c code to cython and am running into "undefined symbol" errors when trying to use functions defined in my c file from another cython module.
Functions defined in my h files and functions using a manual wrapper work without a problem.

Basically the same case as this question but the solution (Linking against the library) isn't satisfactory for me.
I assume i am missing something in the setup.py script ?

Minimized example of my case:

foo.h

int source_func(void);

inline int header_func(void){
    return 1;
}

foo.c

#include "foo.h"

int source_func(void){
    return 2;
}

foo_wrapper.pxd

cdef extern from "foo.h":
    int source_func()
    int header_func()

cdef source_func_wrapper()

foo_wrapper.pyx

cdef source_func_wrapper():
    return source_func()

The cython module i want to use the functions in:
test_lib.pyx

cimport foo_wrapper

def do_it():
    print "header func"
    print foo_wrapper.header_func() # ok
    print "source func wrapped"
    print foo_wrapper.source_func_wrapper() # ok    
    print "source func"
    print foo_wrapper.source_func() # undefined symbol: source_func

setup.py build both foo_wrapper and test_lib

from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize

# setup wrapper
setup(
    ext_modules = cythonize([
        Extension("foo_wrapper", ["foo_wrapper.pyx", "foo.c"])
    ])
)

# setup test module 
setup(
    ext_modules = cythonize([
        Extension("test_lib", ["test_lib.pyx"])
    ])
)

ead · Accepted Answer · 2019-05-10T20:26:46.017

There are 3 different types of function in foo_wrapper:

source_func_wrapper is a python-function and python run-time handles the calling of this function.
header_func is an inline-function which is used at compile time, so its definition/machine code is not needed later on.
source_func on the other hand must be handled by static (this is the case in foo_wrapper) or dynamic (I assume this is your wish for test_lib) linker.

Further down I'll try to explain, why the setup doesn't not work out of the box, but fist I would like to introduce two (at least in my opinion) best alternatives :

A: avoid this problem altogether. Your foo_wrapper wraps c-functions from foo.h. That means every other module should use these wrapper-functions. If everyone just can access the functionality directly - this makes the whole wrapper kind of obsolete. Hide the foo.h interface in your `pyx-file:

#foo_wrapper.pdx
cdef source_func_wrapper()
cdef header_func_wrapper()


#foo_wrapper.pyx
cdef extern from "foo.h":
    int source_func()
    int header_func()

cdef source_func_wrapper():
    return source_func()
cdef header_func_wrapper():

B: It might be valid to want to use the foo-functionality directly via c-functions. In this case we should use the same strategy as cython with stdc++-library: foo.cpp should become a shared library and there should be only a foo.pdx-file (no pyx!) which can be imported via cimport wherever needed. Additionally, libfoo.so should then be added as dependency to both foo_wrapper and test_lib.

However, approach B means more hustle - you need to put libfoo.so somewhere the dynamic loader can find it...

Other alternatives:

As we will see, there are a lot of ways to get foo_wrapper+test_lib to work. First, let's see in more detail, how loading of dynamic libraries works in python.

We start out by taking a look at the test_lib.so at hand:

>>> nm test_lib.so --undefined
....
   U PyXXXXX
   U source_func

there are a lot of undefined symbols most of which start with Py and will be provided by a python executable during the runtime. But also there is our evildoer - source_func.

Now, we start python via

LD_DEBUG=libs,files,symbols python

and load our extension via import test_lib. In the triggered debug -trace we can see the following:

>>>>: file=./test_lib.so [0];  dynamically loaded by python [0]

python loads test_lib.so via dlopen and starts to look-up/resolve the undefined symbols from test_lib.so:

>>>>:  symbol=PyExc_RuntimeError;  lookup in file=python [0]
>>>>:  symbol=PyExc_TypeError;  lookup in file=python [0]

these python symbols are found pretty quickly - they are all defined in the python-executable - the first place dynamic linker looks at (if this executable was linked with -Wl,-export-dynamic). But it is different with source_func:

 >>>>: symbol=source_func;  lookup in file=python [0]
 >>>>: symbol=source_func;  lookup in file=/lib/x86_64-linux-gnu/libpthread.so.0 [0]
  ...
 >>>>: symbol=source_func;  lookup in file=/lib64/ld-linux-x86-64.so.2 [0]
 >>>>:  ./test_lib.so: error: symbol lookup error: undefined symbol: source_func (fatal)

So after looking up all loaded shared libraries the symbol is not found and we have to abort. The fun fact is, that foo_wrapper is not yet loaded, so the source_func cannot be looked up there (it would be loaded in the next step as dependency of test_lib by python).

What happens if we start python with preloaded foo_wrapper.so?

  LD_DEBUG=libs,files,symbols LD_PRELOAD=$(pwd)/foo_wrapper.so python

this time, calling import test_lib succeed, because preloaded foo_wrapper is the first place the dynamic loader looks up the symbols (after the python-executable):

  >>>>: symbol=source_func;  lookup in file=python [0]
  >>>>: symbol=source_func;  lookup in file=/home/ed/python_stuff/cython/two/foo_wrapper.so [0]

But how does it work, when foo_wrapper.so is not preloaded? First let's add foo_wrapper.so as library to our setup of test_lib:

ext_modules = cythonize([
    Extension("test_lib", ["test_lib.pyx"], 
              libraries=[':foo_wrapper.so'], 
              library_dirs=['.'],
    )])

this would lead to the following linker command:

 gcc ... test_lib.o -L. -l:foo_wrapper.so -o test_lib.so

If we now look up the symbols, so we see no difference:

>>> nm test_lib.so --undefined
....
   U PyXXXXX
   U source_func

source_func is still undefined! So what is the advantage of linking against the shared library? The difference is, that now foo_wrapper.so is listed as needed in for test_lib.so:

>>>> readelf -d test_lib.so| grep NEEDED
0x0000000000000001 (NEEDED)             Shared library: [foo_wrapper.so]
0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

ld does not link, this is a job of the dynamic linker, but it does a dry run and help dynamic linker by noting, that foo_wrapper.so is needed in order to resolve the symbols, so it must be loaded before the search of the symbols starts. However, it does not explicitly say, that the symbol source_func must be looked in foo_wrapper.so - we could actually find it and use it anywhere.

Lets start python again, this time without preloading:

  >>>> LD_DEBUG=libs,files,symbols python
  >>>> import test_lib
  ....
  >>>> file=./test_lib.so [0];  dynamically loaded by python [0]....
  >>>> file=foo_wrapper.so [0];  needed by ./test_lib.so [0]
  >>>> find library=foo_wrapper.so [0]; searching
  >>>> search cache=/etc/ld.so.cache
  .....
  >>>> `foo_wrapper.so: cannot open shared object file: No such file or directory.

Ok, now the dynamic linker knows, it has to find foo_wrapper.so but it is nowhere in the path, so we get an error message.

We have to tell dynamic linker where to look for the shared libraries. There is a lot of ways, one of them is to set LD_LIBRARY_PATH:

 LD_DEBUG=libs,symbols,files LD_LIBRARY_PATH=. python
 >>>> import test_lib
 ....
 >>>> find library=foo_wrapper.so [0]; searching
 >>>> search path=./tls/x86_64:./tls:./x86_64:.     (LD_LIBRARY_PATH) 
 >>>> ...
 >>>> trying file=./foo_wrapper.so
 >>>> file=foo_wrapper.so [0];  generating link map

This time foo_wrapper.so is found (dynamic loader looked at places hinted at by LD_LIBRARY_PATH), loaded and then used for resolving the undefined symbols in test_lib.so.

But what is the difference, if runtime_library_dirs-setup argument is used?

 ext_modules = cythonize([
    Extension("test_lib", ["test_lib.pyx"], 
              libraries=[':foo_wrapper.so'], 
              library_dirs=['.'],               
              runtime_library_dirs=['.']
             )
])

and now calling

 LD_DEBUG=libs,symbols,files python
 >>>> import test_lib
 ....
 >>>> file=foo_wrapper.so [0];  needed by ./test_lib.so [0]
 >>>> find library=foo_wrapper.so [0]; searching
 >>>> search path=./tls/x86_64:./tls:./x86_64:.     (RPATH from file ./test_lib.so)
 >>>>     trying file=./foo_wrapper.so
 >>>>   file=foo_wrapper.so [0];  generating link map

foo_wrapper.so is found on a so called RPATH even if not set via LD_LIBRARY_PATH. We can see this RPATH being inserted by the static linker:

  >>>> readelf -d test_lib.so | grep RPATH
        0x000000000000000f (RPATH)              Library rpath: [.]

however this is the path relative to the current working directory, which is most of the time not what is wanted. One should pass an absolute path or use

   ext_modules = cythonize([
              Extension("test_lib", ["test_lib.pyx"], 
              libraries=[':foo_wrapper.so'],
              library_dirs=['.'],                   
              extra_link_args=["-Wl,-rpath=$ORIGIN/."] #rather than runtime_library_dirs
             )
])

to make the path relative to current location (which can change for example through copying/moving) of the resultingshared library. readelf shows now:

>>>> readelf -d test_lib.so | grep RPATH
     0x000000000000000f (RPATH)              Library rpath: [$ORIGIN/.]

which means the needed shared library will be searched relatively to the path of the loaded shared library, i.e test_lib.so.

That is also how your setup should be, if you would like to reuse the symbols from foo_wrapper.so which I do not advocate.

There are however some possibilities to use the libraries you have already built.

Let's go back to original setup. What happens if we first import foo_wrapper (as a kind of preload) and only then test_lib? I.e.:

 >>>> import foo_wrapper
 >>>>> import test_lib

This doesn't work out of the box. But why? Obviously, the loaded symbols from foo_wrapper are not visible to other libraries. Python uses dlopen for dynamical loading of shared libraries, and as explained in this good article, there are some different strategies possible. We can use

 >>>> import sys
 >>>> sys.getdlopenflags() 
 >>>> 2

to see which flags are set. 2 means RTLD_NOW, which means that the symbols are resolved directly upon the loading of the shared library. We need to OR flag withRTLD_GLOBAL=256 to make the symbols visible globally/outside of the dynamically loaded library.

>>> import sys; import ctypes;
>>> sys.setdlopenflags(sys.getdlopenflags()| ctypes.RTLD_GLOBAL)
>>> import foo_wrapper
>>> import test_lib

and it works, our debug trace shows:

>>> symbol=source_func;  lookup in file=./foo_wrapper.so [0]
>>> file=./foo_wrapper.so [0];  needed by ./test_lib.so [0] (relocation dependency)

Another interesting detail: foo_wrapper.so is loaded once, because python does not load a module twice via import foo_wrapper. But even if it would be opened twice, it would be only once in the memory (the second read only increases the reference count of the shared library).

But now with won insight we could even go further:

 >>>> import sys;
 >>>> sys.setdlopenflags(1|256)#RTLD_LAZY+RTLD_GLOBAL
 >>>> import test_lib
 >>>> test_lib.do_it()
 >>>> ... it works! ....

Why this? RTLD_LAZY means that the symbols are resolved not directly upon the loading but when they are used for the first time. But before the first usage (test_lib.do_it()), foo_wrapper is loaded (import inside of test_lib module) and due to RTLD_GLOBAL its symbols can be used for resolving later on.

If we don't use RTLD_GLOBAL, the failure comes only when we call test_lib.do_it(), because the needed symbols from foo_wrapper are not seen globally in this case.

To the question, why it is not such a great idea just to link both modules foo_wrapper and test_lib against foo.cpp: Singletons, see this.

RE: "If everyeveryone just can access the functionality directly - this makes the whole wrapper obsolete": This is exactly what i plan to do though. cdefed classes for the common cases for py and cython, but still be able to use the core lib functions from cython if required. IMHO that is a valid use case. — SleepProgger, Aug 13 '17 at 12:38
I am a bit irritated that i have to do it this way tbh. Cython embeds my `foo.c` code, why can't i just call it like i can call cdefed functions (which are also declared in a pxd -> converted to .h and defined in pyx -> converted to .c file). Basically the exact same scenario or what am i missing here ? — SleepProgger, Aug 13 '17 at 12:42
@SleepProgger probably "obsolete" and "muddy design" is too strong language, there is some overhead involved in calling functions via wrapper so one may want to have direct C-calls. — ead, Aug 14 '17 at 06:41
Sorry, i though i already accepted your answer. Thank you for the detailed information. Another way if one only wants to expose some specific functions is simply declaring function pointers in the pxd file and define them in the pyx file. — SleepProgger, Aug 29 '17 at 18:50

Cython undefined symbol with c wrapper

1 Answers1

Linked