18

two shared libraries liba.so and libb.so. liba.so uses libb.so. All c files are compiled with -fPIC. Linking uses -shared. When we call dlopen on liba.so it cannot find symbols in libb.so...we get the "undefined symbol" error. We can dlopen libb.so with no errors. We know that liba is finding libb because we don't get a file not found error. We get a file not found error when we delete libb.so. We tried -lutil and no luck.

Any ideas????

oh yeah. gcc 4.1.2

update: We use rpath when linking liba so it can find libb.

ldd liba.so returns:

linux-gate.so.1 => (0xffffe000)  
libb.so => ./libb.so (0xf6ef9000)  <-------- LIBB 
libutil.so.1 => /lib/libutil.so.1 (0xf6ef5000)  
libdl.so.2 => /lib/libdl.so.2 (0xf6ef1000)  
libm.so.6 => /lib/libm.so.6 (0xf6ec9000)  
libpthread.so.0 => /lib/libpthread.so.0 (0xf6eb1000)  
librt.so.1 => /lib/librt.so.1 (0xf6ea8000)  
libc.so.6 => /lib/libc.so.6 (0xf6d62000)  
/lib/ld-linux.so.2 (0x007d0000)   

is it significat that there is no .# at the end of libb???

johnnycrash
  • 5,184
  • 5
  • 34
  • 58
  • 2
    You are saying: you created two libs (-fPIC -shared), liba.so and libb.so. liba.so is dinamically linked (or it should be...) with libb.so and uses it. In a program X you try dlopen on libb.so and everything is ok; another test program Y tries to dlopen liba.so but it fails, nonetheless you know liba.so finds libb.so correctly since you tried to delete libb.so and another issue is raised... options you're using for dlopen? – ShinTakezou Jun 07 '10 at 17:20
  • You got it all right. Right now we use no options, because dlopen is called from some program we have no control over. – johnnycrash Jun 07 '10 at 17:27
  • What does command `ldd liba.so` say? – el.pescado - нет войне Jun 07 '10 at 18:01
  • ldd says libb.so => ./libb.so (0xf6ef9000) among other things. All the other lines have an extra .# after the so name, like "libutil.so.1 => /lib/libutil.so.1 (0xf6ef5000)." Is it significat that there is no .# after libb.so??? – johnnycrash Jun 07 '10 at 18:38
  • looks like libb.so is searched in current directory. Where do you start your application from? If it's different directory, it's normal for libb.so to not be found. – Dmitry Yudakov Jun 08 '10 at 07:25
  • But wouldn't I get the same file not found error I get when I delete the libb.so? We get a different error, one that suggests libb is found, but is missing the symbol. libb is in the same dir as liba. – johnnycrash Jun 08 '10 at 08:45
  • 1
    In this case you should check the symbol's definition - if it's defined or just declared – Dmitry Yudakov Jun 09 '10 at 07:11

2 Answers2

29

You can easily check where libb.so is expected to be with ldd command:

 $ ldd liba.so
    linux-gate.so.1 =>  (0xb77b0000)
    libb.so.1 => not found
    libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb75b6000)
    libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb7572000)
    libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb742b000)
    /lib/ld-linux.so.2 (0xb77b1000)

If it's not found, libb.so's path should be added to /etc/ld.so.conf or shell variable LD_LIBRARY_PATH.

Another way is setting rpath in the liba.so itself - it's basically hardcoding its path so when the binary is started the dynamic linker would know where to search for the shared libraries.

If rpath is not set it will first search in LD_LIBRARY_PATH, then the paths mentioned in /etc/ld.so.conf (or /etc/ld.so.conf.d/). After adding to ls.so.conf don't forget to execute /sbin/ldconfig

Dynamic linker searches the dependent shared libraries by their soname (if it's set) - if soname is not set (with -Wl,-soname,libb.so.1 for example), it will be searched by library's name.

Example: libb.so.1.0 is your actual library, having soname - libb.so.1. You would normally have the following files structure:

libb.so -> libb.so.1
libb.so.1 -> libb.so.1.0
libb.so.1.0

where libb.so and libb.so.1 are symlinks.

You usually link to libb.so, when building some application or other library, depending on libb.so.

gcc -shared -Wl,-soname,liba.so.1 -o liba.so.1.2 -L/libb/path -lb

When the application is started (or dlopen is executed - your case) - the dynamic linker will search for file with name libb.so.1 - the soname of dependent library, if the soname is set, not libb.so.

That's why you need that symlink libb.so.1, pointing to the actual library.

If you use ld.so.conf and ldconfig, it will create the symlink with soname's name, pointing to the library file, if this symlink is missing.

You can see ld-linux man page for more useful info.


If the library is found but some of the symbols are missing, try building libb.so with -Wl,--no-undefined option
gcc -shared -Wl,-soname,libb.so.1 -Wl,--no-undefined -o libb.so.1.2

It should give you an error if you missed to define some symbol.

Dmitry Yudakov
  • 15,364
  • 4
  • 49
  • 53
  • we do use rpath. We are pretty sure its finding the library, because we tried running the app after deleting libb and we get a file not found error. When we run with libb we get an undefined symbol error. – johnnycrash Jun 07 '10 at 17:51
  • i might not have understood the part about soname...can you clarify that a bit? – johnnycrash Jun 07 '10 at 18:24
  • I will mess around with that and see what happens. Also, liba uses libb. The app calls dlopen(liba) and gets the error about a symbol used by liba that is in libb. – johnnycrash Jun 08 '10 at 08:44
  • yes, my mistake, I confused what depends on what, it should be fixed now. – Dmitry Yudakov Jun 08 '10 at 09:33
  • BTW, I made a presumption that libb is not found somehow, but it could just not have some of the symbols defined. See my last edit about no-undefined option. – Dmitry Yudakov Jun 08 '10 at 09:41
3

Do not forget that libs order (all -lxxx arguments) are important (at least in gcc) when linking all your objs & libraries to generate your executable.

Short example:

LIBS=-L. -ltest1 -ltest2

OBJS=code1.o code2.o

gcc $(LIBS) $(OBJS) -o mysoft

which can fail in some cases, whereas

gcc $(OBJS) -o mysoft $(LIBS)

won't

Community
  • 1
  • 1
Vincent Fenet
  • 383
  • 2
  • 6
  • 1
    Hey, this answer saved my compilation. I looked into it elsewhere and nobody else is mentioning the order of the `-o` flag (only the order of objects and libraries), although in your example you move your libs to come after `-o` as well as after the objects. Do you know whether this ever matters? – fuzzyTew Nov 28 '21 at 17:23
  • This worked for me, but I cannot understand why though? – Cyclonecode Dec 07 '21 at 00:27
  • Because the linker searches from left to right, and notes unresolved symbols as it goes. For cyclically dependent libraries you also need to specify each libs more than once. More info here: https://stackoverflow.com/questions/45135/why-does-the-order-in-which-libraries-are-linked-sometimes-cause-errors-in-gcc – Vincent Fenet Dec 20 '21 at 14:35