1

I have created a shared library (DLL) which contains a singleton class. The singleton class creates a thread on construction, and calls join in the destructor. See code below.

When I use the DLL in another program (main.cpp) and get the singleton instance, the thread is created and runs as expected. When the program terminates, the singleton destructor is called, the thread join is called but the thread does not complete.

The output I get from the program:

MySingleton::runner start 
a
MySingleton::~MySingleton begin.
MySingleton::~MySingleton before calling join()
MySingleton::~MySingleton after calling join()
MySingleton::~MySingleton end.

Expected Output:

MySingleton::runner start 
a
MySingleton::~MySingleton begin.
MySingleton::~MySingleton before calling join()
MySingleton::runner end.
MySingleton::~MySingleton after calling join()
MySingleton::~MySingleton end.

There are some situations where the thread ends as I expect (and I get expected output):

  1. MySingleton::getInstance is defined in the header
  2. The library is compiled as a .lib instead of .dll (Static lib)
  3. MySingleton singleton definition in main body (not singleton)

I cannot figure out why the thread does not end as expected only in some cases, and I don't understand if it is something to do with MSVC, the way static local variables in a static member function are destructed, or if it is something to do with how I create/join the thread or something else altogether.

EDIT

More situations where expected output happens:

  1. Defining volatile bool running_{false} (may not be a proper solution)
  2. Defining std::atomic_bool running_{false} Seems to be the proper way, or using a mutex.

EDIT 2

Using std::atomic for the running_ variable did not work (although code adjusted below to use it as we don't want UB). I accidentally built as a static library when I was testing std::atomic and volatile and as previously mentioned the issue does not occur with a static library.

I have also tried protecting running_ with a mutex and still have the weird behaviour. (I acquire a lock in an while(true) loop, check !running_ to break.)

I have also updated the thread loop below to increment a counter, and the destructor will print this value (showing loop is actually executing).

// Singleton.h
class MySingleton
{

private:
    DllExport MySingleton();
    DllExport ~MySingleton();

public:
    DllExport static MySingleton& getInstance();
    MySingleton(MySingleton const&) = delete;
    void operator=(MySingleton const&)  = delete;

private:
    DllExport void runner();

    std::thread th_;
    std::atomic_bool running_{false};
    std::atomic<size_t> counter_{0};
};
// Singleton.cpp
MySingleton::MySingleton() {
    running_ = true;
    th_ = std::thread(&MySingleton::runner, this);
}
MySingleton::~MySingleton()
{
    std::cout << __FUNCTION__ << " begin." << std::endl;
    running_ = false;
    if (th_.joinable())
    {
        std::cout << __FUNCTION__ << " before calling join()" << std::endl;
        th_.join();
        std::cout << __FUNCTION__ << " after calling join()" << std::endl;
    }
    std::cout << "Count: " << counter_ << std::endl;
    std::cout << __FUNCTION__ << " end." << std::endl;
}

MySingleton &MySingleton::getInstance()
{
    static MySingleton single;
    return single;
}

void MySingleton::runner()
{
    std::cout << __FUNCTION__ << " start " << std::endl;
    while (running_)
    {
        counter_++;
    }
    std::cout << __FUNCTION__ << " end " << std::endl;
}
// main.cpp
int main()
{
    MySingleton::getInstance();

    std::string s;
    std::cin >> s;

    return 0;
}
// DllExport.h
#ifdef DLL_EXPORT
#define DllExport __declspec(dllexport)
#else
#define DllExport __declspec(dllimport)
#endif
cmake_minimum_required(VERSION 3.13)
project("test")

set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_EXTE NSIONS OFF)

add_library(singleton SHARED Singleton.cpp)
target_compile_definitions(singleton PUBLIC -DDLL_EXPORT)
target_include_directories(singleton PUBLIC ./)
install(TARGETS singleton
        EXPORT singleton-config
        CONFIGURATIONS ${CMAKE_BUILD_TYPE}
        ARCHIVE DESTINATION lib
        LIBRARY DESTINATION lib
        )
install(DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/${CMAKE_BUILD_TYPE}/
        DESTINATION bin
        FILES_MATCHING
        PATTERN "*.dll"
        PATTERN "*.pdb"
        )

add_executable(main main.cpp )
target_link_libraries(main PUBLIC singleton)
install(TARGETS main RUNTIME DESTINATION bin)
  • What happens if you declare `running_` as `volatile`? – vandench Sep 28 '21 at 16:45
  • @vandench this works! As an aside, I tried std::atomic and this did not work. Edit: based on [this question and answer](https://stackoverflow.com/questions/4437527/why-do-we-use-volatile-keyword) I assume this was a compiler optimization that 'caused' the issue. – Braiden Cutforth Sep 28 '21 at 16:48
  • Given that volatile gets the program to exit as expected, my follow up question is why did `join` not block and stop the program from ending if the thread was in an infinite loop? – Braiden Cutforth Sep 28 '21 at 16:55
  • The optimizer likely realized that it did nothing, and could simply be removed. That's what every compiler seems to do: https://gcc.godbolt.org/z/Was6K5xcd – vandench Sep 28 '21 at 17:08
  • 4
    An infinite loop where nothing happens is [undefined behavior](https://stackoverflow.com/questions/41320725/is-infinite-loop-still-undefined-behavior-in-c-if-it-calls-shared-library). Reading from and writing to the same variable from multiple threads without synchronization is _also_ undefined behavior. _Any_ program output is acceptable from the perspective of the C++ standard, so the compiler is free to do weird things like "print the first line from runner but not the last". – Nathan Pierson Sep 28 '21 at 17:12
  • 3
    NO - don't use `volatile` for this. You are getting tricked into thinking that you have something working. `std::atomic running_;` would be a proper way. Never use `volatile` unless you have memory connected to hardware that may change the values without the compiler having any knowledge about it. – Ted Lyngmo Sep 28 '21 at 17:14
  • Show the code using `std::atomic`, please. That *should* work (although using a condition variable is usually a better solution). – numzero Sep 28 '21 at 17:22
  • @NathanPierson according to the referenced answer, it isn’t UB if the variable is atomic, is it? – numzero Sep 28 '21 at 17:25
  • Another detail: If you start a `std::thread` in the constructor, make _very_ sure to mark the class `final` - unless the thread is only allowed to interact with the base class members (which would be unusual). Also, what is `Server::runner`? Did you mean `MySingleton::runner`? – Ted Lyngmo Sep 28 '21 at 17:28
  • 1
    @numzero That's why I had the qualifier "without synchronization". – Nathan Pierson Sep 28 '21 at 17:35
  • @NathanPierson Is it undefined behaviour to read from one thread and write from another with no synchronization? I just assumed the thread would _eventually_ see the updated value and end. – Braiden Cutforth Sep 28 '21 at 17:36
  • 1
    @BraidenCutforth Yes and no, there is no such guarantee. The compiler could even compile the value into the thread code with no chance of changing - just because no synchronization was provided. – Ted Lyngmo Sep 28 '21 at 17:37
  • 1
    @TedLyngmo std::atomic worked. (and yeah it was supposed to be MySingleton::runner, fixed in edit) – Braiden Cutforth Sep 28 '21 at 17:38
  • 1
    @BraidenCutforth Great. You should probably make the default constructor and destructor `private` too. Noone should use those from "the outside" anyway. – Ted Lyngmo Sep 28 '21 at 17:39
  • Is there a reason the thread join returns instead of hanging waiting for the thread to end with the original code written? If the thread doesn't end as expected, I thought join would hang indefinitely – Braiden Cutforth Sep 28 '21 at 17:42
  • 2
    I didn't analyze it closer since it has undefined behavior. Anything could happen. – Ted Lyngmo Sep 28 '21 at 17:44
  • 3
    With a non-atomic `bool`? The compiler is 100% within its rights to say "I see a loop that does absolutely nothing. Optimizer! Exterminate!!!" – user4581301 Sep 28 '21 at 17:48
  • 1
    @user4581301 This issue occurs with a very non-empty loop (locking mutex for shared queue access etc.). The above code is just as small as possible, but I understand this can have some different implications. – Braiden Cutforth Sep 28 '21 at 17:51
  • During progress termination all bets are off - DLLs are being unloaded, threads are being terminated - have a read of (and the contained links) see how bad it is - https://devblogs.microsoft.com/oldnewthing/20100122-00/?p=15193 – Richard Critten Sep 28 '21 at 18:07
  • With a non-thread aware exit condition the compiler is allowed to generate code that assumes the condition will not change. Typically this locks the code in the loop forever rather than never entering at all, but at the end of the day the behaviour of undefined behaviour is undefined. If you're interested in running down exactly what decisions are being made, prove that the loop enters at all. Maybe examining the generated assembly will provide information. – user4581301 Sep 28 '21 at 18:07
  • 1
    This phrase appears in your question "the way static variables in a static function are destructed". There is a huge difference between static member functions and static functions, and between static member variables and static variables. You appear to be talking about a static local variable in a static member function? Try rewriting that part for clarity and precision. – Ben Voigt Sep 28 '21 at 18:25
  • @numzero I made a mistake in my earlier assertion. `std::atomic`, `volatile` and protecting with a mutex do not help. `th_.join()` will still return before the thread seems to end. – Braiden Cutforth Sep 28 '21 at 18:44
  • Where is the `"a\n"` line coming from? I don't see where it is being printed. – Martin York Sep 28 '21 at 18:46
  • @TedLyngmo I think I have removed the undefined behaviour that was mentioned, and the thread still isn't joining like I expect (even using std::atomic or a mutex). Any additional thoughts? – Braiden Cutforth Sep 28 '21 at 18:46
  • @MartinYork that is me typing for the `std::cin >> s` to unblock. If it is confusing I can certainly remove. -- edit Realized I never included `main.cpp` in question... – Braiden Cutforth Sep 28 '21 at 18:47
  • @BraidenCutforth It joins fine and the program exits correctly for me after I've entered something in the `std::cin >> s;` (just pressing return isn't enough). Can you see the singleton destructor being called at all? Does it print the first line in the destructor? Does the program terminate? – Ted Lyngmo Sep 28 '21 at 18:59
  • @TedLyngmo program terminates, and I get all the output from the singleton destructor, except when calling `th_.join()` I do not see the final output from the `MySingleton::runner` method (running in thread). – Braiden Cutforth Sep 28 '21 at 19:15
  • @BraidenCutforth Is that when you've put it in a library or something? I don't get the same behavior if I just take your code and compile as a console application. – Ted Lyngmo Sep 28 '21 at 19:27
  • @TedLyngmo error occurs when I make MySingleton.cpp (and associated headers) into a Dll. Then I use the Dll in another application (this case -> main.cpp) and the issue occurs. If I make it a static library, compile it with the main project (console app for example) then the issue does not occur. – Braiden Cutforth Sep 28 '21 at 19:29
  • 1
    @BraidenCutforth Yeah, I just noticed the comment to the answer. I suggest creating a wrapper class that you _don't_ put in the DLL. It could be in the header file that you ship with the DLL. Make that class a `friend` of the singleton and let that one have the `getInstance()` method. – Ted Lyngmo Sep 28 '21 at 19:29
  • @BraidenCutforth I whipped up an answer as a suggested workaround. I hope that will make it work as you want. – Ted Lyngmo Sep 28 '21 at 19:44
  • 1
    I am not sure what the current standard says so this could be out of date info. But generally you should make sure all child threads are correctly ended before the main() thread ends (i.e. main thread exists the main() function). Otherwise you are entering undefined behavior. Your code currently has the singelton destrucotr hapenning after main exits and thus the thread has probably already been destroyed by the OS. – Martin York Sep 28 '21 at 20:12
  • @MartinYork I'm pretty sure that the order of the destructors called when a program ends is well defined (as in, called in the reverse order the objects were created) when one uses lazy initialization like OP does. The only problem here is that the object in question resides statically in a DLL and Windows doesn't call those objects desctrucors. As long as the object is created within "the program itself" (for a lack of a better description) it should never be a problem, in Windows or otherwise. In those cases, it _should_ wait for the thread to finish - even after `main`. – Ted Lyngmo Sep 28 '21 at 21:35
  • 1
    @TedLyngmo I understand the well-ordered nature of object destruction. I am not getting at that. This is more about how the OS handles "Threads of Execution". I am not well versed in the current standard and what it specifies, so this could have totally been solved (accurately specified, but a quick glance at the standard does not tell me anything). But in C++11 days (before threading was incorporated into the standard) you could not guarantee compiler to compiler how threads would act past the end of the main thread. – Martin York Sep 28 '21 at 22:05

2 Answers2

2

The problem seems to be that the thread is terminated abruptly by ExitProcess (called implicitly when main returns IIRC).

Exiting a process causes the following:

  1. All of the threads in the process, except the calling thread, terminate their execution without receiving a DLL_THREAD_DETACH notification.
  2. The states of all of the threads terminated in step 1 become signaled.
  3. The entry-point functions of all loaded dynamic-link libraries (DLLs) are called with DLL_PROCESS_DETACH.
  4. After all attached DLLs have executed any process termination code, the ExitProcess function terminates the current process, including the calling thread.
  5. The state of the calling thread becomes signaled.
  6. All of the object handles opened by the process are closed.
  7. The termination status of the process changes from STILL_ACTIVE to the exit value of the process.
  8. The state of the process object becomes signaled, satisfying any threads that had been waiting for the process to terminate.

The runner thread is terminated in (1) while the destructor is only called in (3).

numzero
  • 2,009
  • 6
  • 6
  • I confirmed this is the issue. I wrote a simple DllMain function and when DLL_PROCESS_DETACH (step 3) occurs is when the destructor for `MySingleton` is called. – Braiden Cutforth Sep 28 '21 at 19:27
0

The accepted answer explains what happens when you put your threaded singleton in a DLL so the question has been answered.

Here's a suggestion how to circumvent this behavior.

Your DLL.hpp

// A wrapper that is bound to be compiled into the users binary, not into the DLL:
template<class Singleton>
class InstanceMaker {
public:
    static Singleton& getInstance() {
        static Singleton instance;
        return instance;
    }
};

class MySingleton {
private:
    DllExport MySingleton();
    DllExport ~MySingleton();

public:
    // no getInstance() in here
    friend class InstanceMaker<MySingleton>;      // made a friend

    MySingleton(MySingleton const&) = delete;
    void operator=(MySingleton const&)  = delete;

private:
    DllExport void runner();

    std::thread th_;
    std::atomic_bool running_{false};
    std::atomic<size_t> counter_{0};
};

using FancyName = InstanceMaker<MySingleton>;

The user of the DLL would now use

    auto& instance = FancyName::getInstance();

and the destruction should occur before the thread is reaped by ExitProcess.

Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108