26

My program uses dlopen to load a shared object and later dlclose to unload it. Sometimes this shared object is loaded once again. I noticed static variables are not re-initialized (something which is crucial to my program) so I added a test (dlopen with RTLD_NOLOAD) after dlclose to see if the library is really unloaded. Sure enough, it was still in memory.

I then tried calling dlclose repeatedly until the library is really unloaded, but what I got was an infinite loop. This is the code I'm using to check if the library was unloaded:

dlclose(handles[name]);

do {
  void *handle = dlopen(filenames[name], RTLD_NOW | RTLD_NOLOAD);
  if (!handle)
    break;

  dlclose(handle);
} while (true);

My question is, what are the possible reasons for my shared object not being unloaded after dlclose, given that my dlopen calls are the only places where it is loaded. Can you suggest a course of action to track down the source of the problem? Also, why are repeated calls to dlclose have no effect, they are each decrementing the reference count, aren't they?

EDIT: Just found out that this happens only when I compile with gcc. With clang, everything is just fine.

Elektito
  • 3,863
  • 8
  • 42
  • 72
  • Have you try RTLD_LAZY as dlopen flag insted of yours? – Krozark Jun 28 '14 at 13:45
  • 1
    _'they are each decrementing the reference count, aren't they?'_ No, the subsequent calls aren't in your current process. Check the return value, your handle is invalid after the 1st call. – πάντα ῥεῖ Jun 28 '14 at 13:46
  • @Krozark Do you mean the dlopen in which I load the so for the first time, or the one I'm using to check if it is unloaded? – Elektito Jun 28 '14 at 13:47
  • @πάνταῥεῖ I'm not sure I understand. They are not in my process? – Elektito Jun 28 '14 at 13:47
  • Which handle do you really want to close? In the first iteration you close the `handles[name]` handle, then in the next iteration that handle is invalid, but you still use the (now) invalid handle. You never reasign it to another (valid) handle. – Some programmer dude Jun 28 '14 at 13:48
  • @Elektito the first dlopen. – Krozark Jun 28 '14 at 13:50
  • Also, if you re-open the same shared object before `dlclose` you get a second handle to the same shared object. Have you tried to call `dlclose` *before* you try to open it again? – Some programmer dude Jun 28 '14 at 13:50
  • @JoachimPileborg I want to close handles[name]. I understand now. I should `dlclose(handle)` in the next iteration. I did that, still an infinite loop! – Elektito Jun 28 '14 at 13:51
  • @JoachimPileborg Also, the second dlopen is not supposed to really open the file. That doesn't increment the reference count. Does it? – Elektito Jun 28 '14 at 13:52
  • 1
    And as for your `dlclose(handles[name]);`, do you check for errors? Do you reassign `handles[name]` to some other handle if the `dlclose` function succeeds? Otherwise you iterate with an invalid handle. Do you check that `dlsym` succeeds? That `dladdr` succeeds? – Some programmer dude Jun 28 '14 at 13:53
  • @JoachimPileborg Yes, I do check for errors. It really succeeds. But, now I look closer at my code, I see I'm still using `handles[name]` in my call to `dlsym`. Is that the culprit? That code is not strictly speaking necessary, but I don't know another name to get the .so filename, unless I keep a mapping in the loader. – Elektito Jun 28 '14 at 13:57
  • could you post a *complete* minimal example? your code starts with `dlclose`... get rid of `name`, etc. – Karoly Horvath Jun 28 '14 at 14:00
  • 1
    I would say that if you (successfully) close the handle, and then try to use it it will lead to undefined behavior. Once the handle is closed, you can't use it again. – Some programmer dude Jun 28 '14 at 14:00
  • Dynamic loading is a very weird thing. It's not part of the language at all. At best, you can think of dynamic *loading* as part of the program start; global objects only need to be initialized before the first function *in their TU* is called. But *unloading* is a completely different beast. You can't really "partially exit" the program. In fact, all lifetimes are asymmetric like that: lifetimes can start all over the place, but they all end together (end of block, thread, or program). Dynamic unloading doesn't fit into that model. – Kerrek SB Jun 28 '14 at 14:02
  • 1
    @JoachimPileborg: that's right, though if you can still access the symbols from the library, then it really makes you wonder whether dlclose does anything at all (in practice). – Karoly Horvath Jun 28 '14 at 14:03
  • Okay added a more complete snippet. Also fixed the problem of reusing `handles[name]` by keeping a mapping between handles and filenames. – Elektito Jun 28 '14 at 14:08
  • @Elektito: that's quite the opposite of what I asked for. The code just got a lot more complex and you haven't addressed either issues. Do you really need more than 1 module to reproduce the problem? – Karoly Horvath Jun 28 '14 at 14:12
  • Well, this is a rather big system. Actually part of my question was about isolating the problem. I can't really get a minimal example that can be compiled, because it pulls lots of things in. – Elektito Jun 28 '14 at 14:13
  • It's your task to isolate the problem and produce testable code. It's usually done by *starting from scratch*, and not by eliminating some parts of the existing code. – Karoly Horvath Jun 28 '14 at 14:17
  • You're right of course. Perhaps I posted in a hurry. I wanted to ask about what could possibly cause this problem, but it sort of turned into something else, and perhaps the original question was not very suitable for SO either. Thanks for your help...and any new ideas are welcome though! – Elektito Jun 28 '14 at 14:20
  • 1
    For what it's worth, I ran a few simple experiments, and everything is working as expected -- when you match dlopen and dlclose counts, destructors of global variables in the library get called, and block-static variables are newly initialized on subsequent load/unload runs. – Kerrek SB Jun 28 '14 at 14:29
  • 1
    @Kerrek There have been intermittent attempts to fit dynamic loading into standard C++ - e.g.http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2407.html. But I think they've faded out. – Alan Stokes Jun 28 '14 at 18:37
  • 4
    From the man page: " If the reference count drops to zero *and no other loaded libraries use symbols in it*, then the dynamic library is unloaded." You may be hitting this. IMO this means you cannot rely on dlclose actually unloading the library. – n. m. could be an AI Jun 29 '14 at 05:42
  • 1
    I know this is an old question. But I ran into this issue today and solved it in my code. I thought of sharing it here. The library which I had 'dlopen'ed had a lot of C++ symbols exported by default. I added a version script to limit the symbols being exported. This caused the library to get unloaded when 'dlclose' was called. I can't explain why it happened though - the symbols that I un-exported weren't being used in any other component. – Rahul Jan 09 '18 at 11:47

6 Answers6

30

The POSIX standard actually does not require dlclose to ever unload a library from address space:

Although a dlclose() operation is not required to remove structures from an address space, neither is an implementation prohibited from doing so.

Source: The Open Group Base Specifications Issue 6

That means other than invalidating the handle, dlclose is not required to do anything at all.

Sometimes unloading is also delayed by the system, it just marks the library as "to be removed" and will actually perform that operation at some later time (for efficiency or because it would simply not be possible to perform that operation right now). However, if you call dlopen again before it ever was performed, the flag is cleared and the still loaded library is reused.

In some cases the system knows for sure that some symbols of the library are still in use, in that case it will not unload it from address space to avoid dangling pointers. In some cases the system doesn't know for sure that they are in use, but it also can impossibly tell for sure that they are not, better being safe than sorry, it will just never really remove that library from memory in such a case.

There are other more obscure cases depending on operation system kind and often also on version. E.g. a common Linux issue is if you created a library that uses STB_GNU_UNIQUE symbols, that library is marked as "not unloadable" and thus will simply never be unloaded. See here, here (DF_1_NODELETE means not unloadable) and here. So it can also depend on what symbols or kind of symbol a compiler generates. Try running readelf -Ws on your library and look for objects tagged as UNIQUE.

In general, you cannot really rely on dlclose to work as you might expect. In practice I saw it "fail" more often than "succeed" in the last ten years (well, it never really failed, it just often did not unload the library from memory; yet it worked as required by the standards).

Mecki
  • 125,244
  • 33
  • 244
  • 253
  • 3
    Very helpful answer! A hint for people having this issue because of "STB_GNU_UNIQUE": with GCC you can set a compile option "--no-gnu-unique" that will avoid this problem. – oLen May 16 '17 at 15:16
7

This is not the answer to all your questions, but this is the solution that can help you avoid problems with dlclose. This question suggests a clue about how to affect behaviour of re-loading shared libraries: you may use compiler flag -fno-gnu-unique.

From man pages for gcc / g++:

-fno-gnu-unique

On systems with recent GNU assembler and C library, the C++ compiler uses the "STB_GNU_UNIQUE" binding to make sure that definitions of template static data members and static local variables in inline functions are unique even in the presence of "RTLD_LOCAL"; this is necessary to avoid problems with a library used by two different "RTLD_LOCAL" plugins depending on a definition in one of them and therefore disagreeing with the other one about the binding of the symbol. But this causes "dlclose" to be ignored for affected DSOs; if your program relies on reinitialization of a DSO via "dlclose" and "dlopen", you can use -fno-gnu-unique.

Whether -fno-gnu-unique is used by default or not depends on how GCC has been configured: --disable-gnu-unique-object enables this flag by default, --enable-gnu-unique-object disables it.

scrutari
  • 1,378
  • 2
  • 17
  • 33
2

In Windows use the equivalent using ifdef with WIN or LINUX:

  • LoadLibrary() = dlopen()
  • FreeLibrary() = dlclose()
  • GetProcAddress() = dlsym()

void *handle;
double (*cosine)(double);
char *error;

handle = dlopen ("/lib/libm.so.6", RTLD_LAZY);
if (!handle) {
  fputs (dlerror(), stderr);
  exit(1);
  }

cosine = dlsym(handle, "cos");
 if ((error = dlerror()) != NULL)  {
   fputs(error, stderr);
   exit(1);
   }

printf ("%f\n", (*cosine)(2.0));
dlclose(handle);
סטנלי גרונן
  • 2,917
  • 23
  • 46
  • 68
1

There are a lot of quirks to dynamic library loading. Relying on the OS to initialize static variables is fraught with problems. You're much better off either avoiding it altogether or using a plugin loader which handles all the special cases for you.

I recommend you check out glib modules. Glib provides a platform independent way of loading dynamic libraries. You can use these callbacks:

They can handle allocating and deallocating any resources. Instead of relying on the OS to allocate statics for you in a reliable manner you can dynamically allocate what you need.

All you need to do is define these functions in your dynamic library and then load and unload them with:

jcoffland
  • 5,238
  • 38
  • 43
  • 2
    glib modules also uses dlopen/dlclose on all platforms other than Windows, and if dlclose does not unload the library, which can happen for plenty or reasons on different systems (some actually never unload libraries at all, dlclose is just an empty method there) then using glib modules will buy you exactly nothing on that platform. – Mecki Nov 20 '14 at 00:24
1

This can be fixed (perhaps not in all scenarios), by calling dlopen with RTLD_LOCAL.

I was experiencing the same issue, where a destructor was not being called, but if I open the shared object with RTLD_LOCAL, then dlclose behaves as expected, and calls the destructor.

BLuFeNiX
  • 2,496
  • 2
  • 20
  • 40
0

on my WSL, I had an issue that dlclose didn't invoke lib destructor, and my "direct_refcount" of the lib just continued to rise after each load command, regardless of how many times I dlclose the same handle.

However, it was fixed when I changed the dlsym command to use the handle returned from dlopen and not using RTLD_DEFAULT as I assumed I could just traverse the scope to find that symbol anyway.

I don't know what's the difference, but that's resolved that issue specifically on my setup.

Setup WSL, Ubuntu 20, GCC 9.4, GNU ld 2.34, GLIBC 2.31.

Aviv
  • 414
  • 4
  • 16