0

I am using this snippet of code to handle a pool of threads. After calling RunTask a number of times (where the task in question is a method that sleeps for 5 seconds), I then call Wait() in an attempt to wait for all the threads to finish. The code hangs in this Wait() mehhod. I suspect its my management of the condition variable, but I can't figure it out.

class lthThreadHandler {
public:
    lthThreadHandler(int pool_size)
    {
        mAvailable = pool_size;
        mRunning = true;
        for (int i = 0; i < pool_size; ++i)
            mThreads.create_thread(boost::bind(&lthThreadHandler::pool_main, this));
    }

    template<typename Task>
    void RunTask(Task task)
    {
        boost::unique_lock<boost::mutex> lock(mMutex);

        // If no threads are available, then return.
        if (0 == mAvailable)
            return;

        // Decrement count, indicating thread is no longer available.
        --mAvailable;

        // Set task and signal condition variable so that a worker thread will
        // wake up andl use the task.
        mTasks.push(boost::function<void()>(task));
        mCondition.notify_one();
    }

    void Wait() {
        try {
            mThreads.join_all();
        }
        // Suppress all exceptions.
        catch (...) {
        }
    }

    /// @brief Entry point for pool threads.
    void pool_main() {
        while (mRunning) {
            // Wait on condition variable while the task is empty and the pool is
            // still running.
            boost::unique_lock<boost::mutex> lock(mMutex);
            while (mTasks.empty() && mRunning) {
                mCondition.wait(lock);
            }
            // If pool is no longer running, break out.
            if (!mRunning)
                break;

            // Copy task locally and remove from the queue.  This is done within
            // its own scope so that the task object is destructed immediately
            // after running the task.  This is useful in the event that the
            // function contains shared_ptr arguments bound via bind.
            {
                boost::function<void()> task = mTasks.front();
                mTasks.pop();

                lock.unlock();

                // Run the task.
                try {
                    task();
                }
                // Suppress all exceptions.
                catch (...) {
                }
            }

            // Task has finished, so increment count of available threads.
            lock.lock();
            ++mAvailable;
        } // while mRunning
    }
private:
    std::queue<boost::function<void()> > mTasks;
    boost::thread_group mThreads;
    std::size_t mAvailable;
    boost::mutex mMutex;
    boost::condition_variable mCondition;
    bool mRunning;
};

This code snippet come from another SO answer: Thread pool using boost asio

Community
  • 1
  • 1
  • Do you have access to C++11? You may want an atomic boolean. – AndyG Jan 19 '16 at 03:00
  • 1
    How and where does `mRunning` get set to `false`? – AndyG Jan 19 '16 at 03:02
  • Does `mCondition.wait(lock);` keep `mMutex` locked for a long time (or indefinitely)? Something like that could cause trouble. – J.J. Hakala Jan 19 '16 at 03:08
  • @AndyG mRunning gets set to false in the destructor, not shown here –  Jan 19 '16 at 03:19
  • @AndyG No C++11. This is in a legacy system still on VS2010 –  Jan 19 '16 at 03:23
  • 1
    Then how do you make shure, that the destructor is called? When is it called? Have you placed a breakpoint, where running is set to false? - because your wait is not hanging indefinitely, your condition to leave the wait part is never fullfulled! – EGOrecords Jan 19 '16 at 03:55
  • 1
    `catch(...)` without rethrowing is a stupid idea, in particular on MS Windows, where it catches some other errors that are not C++ exceptions. Replace it with `catch(exception const&)` and log the error. That said, provide a minimal but complete example! As per the posting guidelines, your question is off-topic without it. – Ulrich Eckhardt Jan 19 '16 at 06:04

1 Answers1

0

Wait hangs indefinitely because the threads all run an infinite loop as long as you don't clear mRunning. Even if you did it might change your pet into a bag of popcorn because of the data race.


So. Basically you butchered someone's perfectly fine taskqueue/threadpool implementation. Mostly by renaming things idly¹ and then

  • removing the crucial destructor
  • adding a Wait() that just joins threads without doing anything to cause these threads to finish.
  • adding race conditions on the mRunning flag

Even if you put the original destructor logic into Wait() keep in mind you'll actually have a race condition where there is no guarantee that the posted tasks will run before the (main) thread resets the mRunning flag.

What Now?

I suggest you don't meddle with the code.

  • If there is something you need it to do, ask a constructive question like "How can I make sure that XXX" instead of "Why does my broken code not YYY"
  • If you want to learn ask about the original implementation, or about a single isolated change

Alternatives

I have a few thread similar pools that you might find easier to understand on [SO] so you could compare:

I answered a similar question once before (in response to the fact that you're trying to join_all from a member function instead of the destructor:

That answer could be very helpful to you explaining why that kind of function is tricky.


¹ pool_main, really? Thread manager, really? All the names were originally better chosen.

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633