3

I have the following code that runs functions on a dedicated thread. It works great except for the destructor. The call to thread_.join() does not return. I am using VS2013 Express.

What would I change so that the thread joins correctly?

#include <atomic>
#include <condition_variable>
#include <mutex>
#include <thread>
#include <vector>

namespace
{
    class main_thread
    {
    public:
        static auto instance() -> main_thread&
        {
            static main_thread instance_;
            return instance_;
        }
        auto enque(std::function<void()> func) -> void
        {
            {
                std::lock_guard<std::mutex> lock{ mutex_ };
                queue_.push_back(func);
            }
            condition_.notify_one();
        }
    private:
        main_thread()
        {
            continue_.test_and_set();
            thread_ = std::thread{ std::bind(std::mem_fn(&main_thread::run), this) };
        }
        ~main_thread()
        {
            continue_.clear();
            condition_.notify_all();
            if (thread_.joinable())
            {
                thread_.join();
            }
        }
        main_thread(const main_thread &other) = delete;
        main_thread(main_thread &&other) = delete;
        main_thread& operator=(const main_thread &other) = delete;
        main_thread& operator=(main_thread &&other) = delete;

        auto run() -> void
        {
            while (continue_.test_and_set())
            {
                auto lock = std::unique_lock<std::mutex>{ mutex_ };
                //condition_.wait_for(lock, std::chrono::milliseconds(1));
                condition_.wait(lock);
                for (auto &func : queue_)
                {
                    func();
                }
                queue_.clear();
            }
        }

        std::condition_variable condition_;
        std::mutex mutex_;
        std::vector<std::function<void()>> queue_;
        std::thread thread_;
        std::atomic_flag continue_;
    };
}

auto on_main_thread(std::function<void()> func) -> void
{
    main_thread::instance().enque(std::move(func));
}

auto on_main_thread_sync(std::function<void()> func) -> void
{
    bool done{ false };
    on_main_thread([&]{
        func();
        done = true;
    });
    while (!done);
}

The only function exercising this code is

int main()
{
    on_main_thread([]{});
}

This avoids the issue of the race in on_main_thread_sync but still has the lockup in ~main_thread. Visual Studio indicates that there are 2 threads, but neither is in main_thread::run, so I do not understand what is going on. That function exited correctly, but it for some reason the thread is not ending.

Graznarak
  • 3,626
  • 4
  • 28
  • 47
  • 1
    There is a data race on `done` in `on_main_thread_sync`, it needs to be `std::atomic`. I'm not sure if it's *the* problem, but it is *a* problem. – Casey Nov 14 '13 at 06:27
  • I agree that that is a problem. I noticed that as I was posting this. However, I do not believe that it is causing the particular problem that I was having. – Graznarak Nov 14 '13 at 06:58
  • possible duplicate of [std::thread.join() deadlock](http://stackoverflow.com/questions/17191894/stdthread-join-deadlock) – Robert Jørgensgaard Engdahl Nov 14 '13 at 07:58
  • Marked as duplicate. I am not sure if that is the right way to do it, but you seem to have the same problem as this guy: http://stackoverflow.com/questions/17191894/stdthread-join-deadlock – Robert Jørgensgaard Engdahl Nov 14 '13 at 07:59
  • That does fix the issue. Is this a Microsoft specific issue or do GCC and/or Clang do the same thing? – Graznarak Nov 14 '13 at 15:57
  • This appears to be a known Visual Studio bug. http://connect.microsoft.com/VisualStudio/feedback/details/747145/ – Graznarak Nov 14 '13 at 16:56

2 Answers2

1

You should not call external code from within a critical section of code, this can easily lead to deadlocks.

If you pause execution in the debugger, you may see that you have one or more threads waiting to acquire _mutex.

You won't be able to acquire the unique_lock on _mutex again if any of the code called from func() tries to enqueue().

Try releasing the lock once the condition_variable wait is over. As a test, you can put in an extra scope to see if this helps:

while (continue_.test_and_set())
{
    std::vector<std::function<void()>> queue;
    {
        auto lock = std::unique_lock<std::mutex>{ mutex_ };
        //condition_.wait_for(lock, std::chrono::milliseconds(1));
        condition_.wait(lock);
        queue.swap(queue_);
    }
    for (auto &func : queue)
    {
        func();
    }
}
Phillip Kinkade
  • 1,382
  • 12
  • 18
  • You've introduced a data race by accessing `queue_` outside of `mutex_`. – Casey Nov 14 '13 at 06:51
  • Thanks for noticing that, @Casey. I've updated the solution to only access queue_ while mutex_ is acquired. – Phillip Kinkade Nov 14 '13 at 06:58
  • This is another issue that I had missed. I appreciate you noticing it. I have hit this type of deadlock at work, and you would think that I would know better by now. – Graznarak Nov 14 '13 at 07:00
  • I implemented this change and it still has the same behavior. – Graznarak Nov 14 '13 at 07:10
  • When you pause in the debugger, what are your threads waiting for other than the join? – Phillip Kinkade Nov 14 '13 at 07:14
  • Nothing that I can see. The thread running `main()` is waiting on the join. The other thread has dropped out of any code that I have and just shows an address. – Graznarak Nov 14 '13 at 15:47
  • There's nothing wrong with your join(). It's properly waiting for the other thread to complete. You need to look up the stack on the other thread, unless you have some corruption happening in func(), you'll at least see the call to func() somewhere in your stack. – Phillip Kinkade Nov 14 '13 at 17:58
  • ...or the stack may show the other thread is in enqueue(), waiting to acquire the mutex. Maybe you could post the backtrace of the other/stuck thread here? – Phillip Kinkade Nov 14 '13 at 18:24
1

You have a potential live-lock in your code at shutdown. The following interleaving is possible:

main() thread      thread in run()
                   check continue_, see it is true
set continue_ = false
notify the condition variable
join
                   wait on condition variable

To avoid this, you need the condition check and cv wait to happen atomically. This is most easily accomplished by protecting continue_ with mutex_ (Live at Coliru):

class main_thread
{
public:
    static auto instance() -> main_thread&
    {
        static main_thread instance_;
        return instance_;
    }
    auto enque(std::function<void()> func) -> void
    {
        {
            std::lock_guard<std::mutex> lock{ mutex_ };
            queue_.push_back(func);
        }
        condition_.notify_one();
    }
private:
    main_thread() : continue_{true}
    {
        thread_ = std::thread{ &main_thread::run, this };
    }
    ~main_thread()
    {
        {
            std::lock_guard<std::mutex> lock{ mutex_ };
            continue_ = false;
        }
        condition_.notify_all();
        if (thread_.joinable())
        {
            thread_.join();
        }
    }

    auto run() -> void
    {
        std::unique_lock<std::mutex> lock{ mutex_ };
        while(continue_)
        {
            if(queue_.empty())
            {
                condition_.wait(lock);
                continue;
            }

            std::vector<std::function<void()>> queue;
            queue.swap(queue_);
            lock.unlock();
            for (auto &func : queue)
            {
                func();
            }
            lock.lock();
        }
    }

    std::condition_variable condition_;
    std::mutex mutex_;
    std::vector<std::function<void()>> queue_;
    bool continue_;
    std::thread thread_;
};
Casey
  • 41,449
  • 7
  • 95
  • 125