We have a server which must be able to be updated without downtime.
We achieve this by making the application to be only a thin loader layer and all the logic is in the shared object which is dlopen-ed by the application. Let's call this lib as libmyservice.so.1.23
.
When a request comes in the server creates a thread and calls the appropriate APIs from the lib to serve it.
When the server needs to be hot updated it downloads a new set of libraries and loads them with dlopen let's call the new library as libmyservice.so.1.24
.
During the update there is an intermediate period where the already running requests are still served by the old library while the new requests are served by the new. When the old requests finish the old library is unloaded and all requests will use the new library. So there is no downtime during the update.
The library is compiled to be as self contained as possible. It depends on boost, openssl and many other C++ stuff we don't have control over. All these dependencies are shipped along with the library and rpath is used to load them from the same directory as the lib.
In practice we ran into two problems:
Symbol conflicts: when the new library is loaded the dynamic linker reuses the symbols from the old this can cause weird bugs. We have found this can be worked around by passing RTLD_DEEPBIND to the dlopen. But we also have dlmopen which does a similar thing. Which of the two should be used when the goal is to completely isolate the old and new set of libraries?
Static initialization and cleanup: we had a bug that using global objects such as std::cerr from the dlopened library caused crash when the library loaded using RTLD_DEEPBIND. We still don't fully understand why does this happen. We believe that static initialization doesn't happen when a new set of libstdc++'s symbols is loaded when RTLD_DEEPBIND is used. This seems to be a bug in the dynamic linker to me. How static initialization and cleanup supposed to work in shared objects?
How can I load two shared libraries and all their dependencies simultaneously without any conflict between the two? Is it even possible? That is no symbol conflicts, no conflict in the static initialization.
The software works properly already on Windows because DLLs doesn't seem to conflict with each other like shared objects do.
EDIT:
Although I like the idea of multiple processes, the architecture is already decided and I can't change it without many approvals (which I unlikely to get).
To make matters complicated (which I didn't want in the original question), the actual architecture is like this: server application -> main library -> hot updateble core libraries. The "main library" is the product which is a proxy above a hot swappable core libs. The customer gets this library and integrates it to their product. During update the customer's product download the update package and calls a function in the main library to trigger the hot swap. The main library doesn't do the threads, but it's implemented to be thread safe.
So multiple processes are not really the option.