Why does my Wait method hang indefinitely?

Question

I am using this snippet of code to handle a pool of threads. After calling RunTask a number of times (where the task in question is a method that sleeps for 5 seconds), I then call Wait() in an attempt to wait for all the threads to finish. The code hangs in this Wait() mehhod. I suspect its my management of the condition variable, but I can't figure it out.

class lthThreadHandler {
public:
    lthThreadHandler(int pool_size)
    {
        mAvailable = pool_size;
        mRunning = true;
        for (int i = 0; i < pool_size; ++i)
            mThreads.create_thread(boost::bind(&lthThreadHandler::pool_main, this));
    }

    template<typename Task>
    void RunTask(Task task)
    {
        boost::unique_lock<boost::mutex> lock(mMutex);

        // If no threads are available, then return.
        if (0 == mAvailable)
            return;

        // Decrement count, indicating thread is no longer available.
        --mAvailable;

        // Set task and signal condition variable so that a worker thread will
        // wake up andl use the task.
        mTasks.push(boost::function<void()>(task));
        mCondition.notify_one();
    }

    void Wait() {
        try {
            mThreads.join_all();
        }
        // Suppress all exceptions.
        catch (...) {
        }
    }

    /// @brief Entry point for pool threads.
    void pool_main() {
        while (mRunning) {
            // Wait on condition variable while the task is empty and the pool is
            // still running.
            boost::unique_lock<boost::mutex> lock(mMutex);
            while (mTasks.empty() && mRunning) {
                mCondition.wait(lock);
            }
            // If pool is no longer running, break out.
            if (!mRunning)
                break;

            // Copy task locally and remove from the queue.  This is done within
            // its own scope so that the task object is destructed immediately
            // after running the task.  This is useful in the event that the
            // function contains shared_ptr arguments bound via bind.
            {
                boost::function<void()> task = mTasks.front();
                mTasks.pop();

                lock.unlock();

                // Run the task.
                try {
                    task();
                }
                // Suppress all exceptions.
                catch (...) {
                }
            }

            // Task has finished, so increment count of available threads.
            lock.lock();
            ++mAvailable;
        } // while mRunning
    }
private:
    std::queue<boost::function<void()> > mTasks;
    boost::thread_group mThreads;
    std::size_t mAvailable;
    boost::mutex mMutex;
    boost::condition_variable mCondition;
    bool mRunning;
};

This code snippet come from another SO answer: Thread pool using boost asio

Do you have access to C++11? You may want an atomic boolean. — AndyG, Jan 19 '16 at 03:00
Does `mCondition.wait(lock);` keep `mMutex` locked for a long time (or indefinitely)? Something like that could cause trouble. — J.J. Hakala, Jan 19 '16 at 03:08
@AndyG mRunning gets set to false in the destructor, not shown here — , Jan 19 '16 at 03:19
Then how do you make shure, that the destructor is called? When is it called? Have you placed a breakpoint, where running is set to false? - because your wait is not hanging indefinitely, your condition to leave the wait part is never fullfulled! — EGOrecords, Jan 19 '16 at 03:55
`catch(...)` without rethrowing is a stupid idea, in particular on MS Windows, where it catches some other errors that are not C++ exceptions. Replace it with `catch(exception const&)` and log the error. That said, provide a minimal but complete example! As per the posting guidelines, your question is off-topic without it. — Ulrich Eckhardt, Jan 19 '16 at 06:04

score 0 · Answer 1 · edited May 23 '17 at 10:28

Wait hangs indefinitely because the threads all run an infinite loop as long as you don't clear mRunning. Even if you did it might change your pet into a bag of popcorn because of the data race.

So. Basically you butchered someone's perfectly fine taskqueue/threadpool implementation. Mostly by renaming things idly¹ and then

removing the crucial destructor
adding a Wait() that just joins threads without doing anything to cause these threads to finish.
adding race conditions on the mRunning flag

Even if you put the original destructor logic into Wait() keep in mind you'll actually have a race condition where there is no guarantee that the posted tasks will run before the (main) thread resets the mRunning flag.

What Now?

I suggest you don't meddle with the code.

If there is something you need it to do, ask a constructive question like "How can I make sure that XXX" instead of "Why does my broken code not YYY"
If you want to learn ask about the original implementation, or about a single isolated change

Alternatives

I have a few thread similar pools that you might find easier to understand on [SO] so you could compare:

Calculating the sum of a large vector in parallel (links to two implementations)

I answered a similar question once before (in response to the fact that you're trying to join_all from a member function instead of the destructor:

asio::io_service and thread_group lifecycle issue

That answer could be very helpful to you explaining why that kind of function is tricky.

¹ pool_main, really? Thread manager, really? All the names were originally better chosen.

Why does my Wait method hang indefinitely?

1 Answers1

What Now?

Alternatives