1

tl;dr

Under what circumstances could I expect to see library.so => not found when:

  1. The linker search path is correct, as verified by other libraries in the same directory being linked.
  2. The so file in question actually exists, is a regular file, is nonempty, and is linked successfully by other tests in the project.

Background

I'm building a library project, call it libspecific which depends on a library, call it libgeneric. libspecific has a few submodules, two of which are relevant, call them libspecific_base and libspecific_extension. Both libspecific_base and libspecific_extension depend on a few libraries from libgeneric, call them libgeneric_utils, libgeneric_math, libgeneric_geometry, and libgeneric_algorithms. libspecific_extension also depends on libspecific_base.

Typically I want to install libgeneric from my company's internal apt repositories, but it's possible to build and install from source. In either case, the directory structure in /usr/lib/libgeneric looks like this:

/usr/lib/libgeneric
├── libgeneric_utils.so
├── libgeneric_math.so
├── libgeneric_geometry.so
└── libgeneric_algorithms.so

Installing from the internal apt repositories, I see something like

$ file libgeneric_geometry.so                                                                
libgeneric_geometry.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=6eba11784651535bdebd995af672589f99f95688, stripped

This makes sense, because debug symbols are built and then stripped and shipped in a separate -dev package. If I build and install from source, but without debug symbols, I see pretty much what's above, only not stripped at the end.

Problem

I'm trying to build tests for libspecific_base. If I build and install from the libgeneric source, I have no problem. If I use the pre-built libraries, I get this error:

/home/alan/projects/libspecific/cmake-build-release-coverage/libspecific_base_tests: error while loading shared libraries: libgeneric_geometry.so: cannot open shared object file: No such file or directory
CMake Error at /usr/share/cmake-3.19/Modules/GoogleTestAddTests.cmake:77 (message):
  Error running test executable.

    Path: '/home/alan/projects/libspecific/cmake-build-release-coverage/libspecific_base_tests'
    Result: 127
    Output:

If I inspect libspecific_base.so, I get the following:

$ ldd libspecific_base.so | egrep -e '(libgeneric|geometry)'
    libgeneric_utils.so => /usr/lib/libgeneric/libgeneric_utils.so (0x00007f7a798b9000)
    libgeneric_math.so => /usr/lib/libgeneric/libgeneric_math.so (0x00007f7a77cc7000)
    libgeneric_algorithms.so => /usr/lib/libgeneric/libgeneric_algorithms.so (0x00007f7a77a1c000)
    libgeneric_geometry.so => not found

So it can't be the case that LD_LIBRARY_PATH is wrong. If I do readelf -d libspecific_base.so | grep libgeneric, I get:

$ readelf -d libspecific_base.so | grep libgeneric

0x0000000000000001 (NEEDED)             Shared library: [libgeneric_utils.so]
0x0000000000000001 (NEEDED)             Shared library: [libgeneric_math.so]
0x0000000000000001 (NEEDED)             Shared library: [libgeneric_algorithms.so]

but no libgeneric_geometry.so. I also see /usr/lib/libgeneric when I do readelf -d libspecific_base.so | grep RUNPATH.

What's weird is that I can still build and run tests for libspecific_extension. If I inspect libspecific_extension.so I get:

$ ldd libspecific_extension.so | egrep -e '(libgeneric|geometry)'
    libgeneric_utils.so => /usr/lib/libgeneric/libgeneric_utils.so (0x00007f7a798b9000)
    libgeneric_math.so => /usr/lib/libgeneric/libgeneric_math.so (0x00007f7a77cc7000)
    libgeneric_geometry.so => /usr/lib/libgeneric/libgeneric_geometry.so (0x00007fbfc6a85000)
    libgeneric_algorithms.so => /usr/lib/libgeneric/libgeneric_algorithms.so (0x00007f7a77a1c000)

Question

I don't necessarily need a specific answer to this problem, but I would like to know what could cause this so I can investigate further. I'm far from an expert on linker.

Alan Liddell
  • 179
  • 2
  • 11
  • So, unless I missed the main point of the problem, the only issue here is that the pre-built binaries don't contain that libgeneric_geometry.so? – rturrado Oct 12 '21 at 16:53
  • They do contain it, it's just that `libspecific_base` fails to link against it while `libspecific_extension` doesn't fail. – Alan Liddell Oct 12 '21 at 17:04
  • Could it be that you only have one version of `libgeneric_geometry` in your system, let's say the non stripped version, and that `libspecific_extension.so` uses the non stripped version (great, found!) whereas `libspecific_base` links to the stripped version (ouch, not found!)? Something similar to what happened to this guy: https://stackoverflow.com/q/25314983/260313 – rturrado Oct 12 '21 at 17:18
  • No, it turns out that `libspecific_extension.so` links to the stripped or the non-stripped versions without complaining one way or the other. – Alan Liddell Oct 12 '21 at 17:34
  • 1
    Reasons it wouldn't be found: *Incorrect LD_LIBRARY_PATH, rpath, etc. *The path/file doesn't exist *the file found is incompatible (like 32bit vs 64bit libraries). – RMiller Oct 12 '21 at 23:00

1 Answers1

0

The best way to debug this kind of problem is by running the binary with LD_DEBUG=files,libs.

So it can't be the case that LD_LIBRARY_PATH is wrong.

Yes, it can. In fact LD_LIBRARY_PATH may not be set at all -- you have RUNPATH set, and that is searched before LD_LIBRARY_PATH. See man ld.so.

Also note that the best practice is to build binaries such that they don't need LD_LIBRARY_PATH. Using the environment variable opens you to "works for me, but not for the other guy" bugs and unpredictability.

I'm far from an expert on linker.

The linker is out of the picture by the time you are running an executable.

It's the runtime loader which performs searches for the libraries, and your question is mostly about the ld.so behavior.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362