boost thread worker object re-use after thread has finished

Question

I have a simple thread object which takes care of some execution (a worker):

In the simplest form, an object is created for every single thread:

class worker
{
public:

    worker  (
              boost::atomic<int> & threads,
              boost::mutex & mutex,
              boost::condition_variable & condition
            )
    : threads__(threads), mutex__(mutex), condition__(condition)
    {}  

    void run (
                 // some params
             )
    {   
        // ... do the threaded work here
        // finally, decrease number of running threads and notify
        boost::mutex::scoped_lock lock(mutex__);
        threads__--;
        condition__.notify_one();
    }   

private:
    boost::atomic<int> & threads__;
    boost::mutex & mutex__;
    boost::condition_variable & condition__;
};

The way I use it is within a loop which runs maximum 8 concurrent threads, and waits to be notified if one has finished, in order to spawn the next one:

boost::thread_group thread_group;
boost::mutex mutex;
boost::condition_variable condition;
boost::atomic<int> threads(0);

// Some loop which can be parallelised
for ( const auto & x : list )
{
    // wait if thread_count exceeds 8 threads
    boost::mutex::scoped_lock lock(mutex);
    while ( threads >= 8 ) 
         condition.wait( lock );

    // Create worker object
    worker _wrk_( threads, mutex, condition );

    boost::thread * thread = new boost::thread( &worker::run, &_wrk_, /* other params */ );
    thread_group.add_thread( thread );
    threads++;
}

This works for most of my scenarios, but now I have a thread object which I need to re-use.

The reason is simple: This tread object contains thrust::device_vector<float> which are expensive to (re)allocate when the object is removed.

Furthermore, those vectors can be re-used as most of their contents won't be changed.

Therefore, I am looking for a mechanism which could re-use the objects created within the loop - in fact, I will allocate 8 of those objects (or as many as my concurrent threads) before hand, and then use them over and over again. What I am hoping can be done, is something such as this:

boost::thread_group thread_group;
boost::mutex mutex;
boost::condition_variable condition;
boost::atomic<int> threads(0);

// our worker objects to be reused
std::vector<std::shared_ptr<worker>>workers(8,std::make_shared<worker>(threads,mutex,condition));

// Some loop which can be parallelised
for ( const auto & x : list )
{
    // wait if thread_count exceeds 8 threads
    boost::mutex::scoped_lock lock(mutex);
    while ( threads >= 8 ) 
         condition.wait( lock );

    // get next available thread object from the vector
    auto _wrk_ = std::find_if(workers.begin(), workers.end(), is_available() );

    // if we have less than 8 threads but no available thread object
    if ( _wrk_ == workers.end() ) throw std::runtime_error ("...");

    // Use the first available worker object for this thread
    boost::thread * thread = new boost::thread(&worker::run, &(*_wrk_));
    thread_group.add_thread( thread );
    threads++;
}

I don't know how to signal the is_available(), other than implementing it as a class method (of the worker class).

Second, this appears to me to be too complex for no reason, I'm sure there has to be some kind of other pattern I can use which is more simple and/or elegant.

I am not familiar with boost::thread_group, but from your usage it does not seems you are using it as a thread pool - you are creating a new thread for every request. I mean use classic thread pool, when you start a pre-defined number of threads in the beginning, connect them to the message queue and post 'work' requests to this queue to be picked up by first available thread. — SergeyA, Nov 20 '15 at 16:40
@SergeyA I don't know how to do that. You are correct, I am creating new threads. The parameters to each thread change, but the objects remain the same. Those objects are very expensive to copy (they do host-device memory copies). Would you be so kind to provide a simplified example of what you mean? Should I be using a thread_pool such as this one: http://stackoverflow.com/questions/12215395/thread-pool-using-boost-asio/12267138#12267138 — Ælex, Nov 20 '15 at 16:44
This looks like the thread pool enough to me. You can google thread pool - it is pretty widely used concept, which should be very applicable in your case. — SergeyA, Nov 20 '15 at 16:45

Richard Hodges · Accepted Answer · 2015-11-20T17:42:59.070

a very simple way to implement a thread pool is to use boost::asio.

Complete example here, including two types of task (function and object) plus exception handling:

#include <iostream>
#include <vector>
#include <thread>
#include <string>
#include <chrono>
#include <random>
#include <condition_variable>
#include <boost/asio.hpp>

void emit(const char* txt, int index)
{
    static std::mutex m;
    std::lock_guard<std::mutex> guard { m };
    std::cout << txt << ' ' << index << std::endl;
}

struct worker_pool
{
    boost::asio::io_service _io_service;
    boost::asio::io_service::work _work { _io_service };

    std::vector<std::thread> _threads;

    std::condition_variable _cv;
    std::mutex _cvm;
    size_t _tasks = 0;


    void start()
    {
        for (int i = 0 ; i < 8 ; ++i) {
            _threads.emplace_back(std::bind(&worker_pool::thread_proc, this));
        }
    }

    void wait()
    {
        std::unique_lock<std::mutex> lock(_cvm);
        _cv.wait(lock, [this] { return _tasks == 0; });
    }

    void stop()
    {
        wait();
        _io_service.stop();
        for (auto& t : _threads) {
            if (t.joinable())
                t.join();
        }
        _threads.clear();

    }

    void thread_proc()
    {
        while (!_io_service.stopped())
        {
            try {
                _io_service.run();
            }
            catch(const std::exception& e)
            {
                emit(e.what(), -1);
            }
        }
    }

    void reduce() {

        std::unique_lock<std::mutex> lock(_cvm);
        if (--_tasks == 0) {
            lock.unlock();
            _cv.notify_all();
        }
    }

    template<class F>
    void submit(F&& f)
    {
        std::unique_lock<std::mutex> lock(_cvm);
        ++ _tasks;
        lock.unlock();
        _io_service.post([this, f = std::forward<F>(f)]
                         {
                             try {
                                 f();
                             }
                             catch(...)
                             {
                                 reduce();
                                 throw;
                             }
                             reduce();
                         });


    }
};



void do_some_work(int index, std::chrono::milliseconds delay)
{
    emit("starting work item ", index);
    std::this_thread::sleep_for(delay);
    emit("ending work item ", index);
}

struct some_other_work
{
    some_other_work(int index, std::chrono::milliseconds delay)
    : _index(index)
    , _delay(delay)
    {}

    void operator()() const {

        emit("starting some other work ", _index);

        if (!(_index % 7)) {
            emit("uh oh! ", _index);
            using namespace std::string_literals;
            throw std::runtime_error("uh oh thrown in "s + std::to_string(_index));
        }

        emit("ending some other work ", _index);
    }

    int _index;
    std::chrono::milliseconds _delay;
};

auto main() -> int
{
    worker_pool pool;
    pool.start();

    std::random_device rd;
    std::default_random_engine eng(rd());

    std::uniform_int_distribution<int> dist(50, 200);
    for (int i = 0 ; i < 1000 ; ++i) {
        std::chrono::milliseconds delay(dist(eng));
        pool.submit(std::bind(do_some_work, i, delay));
        pool.submit(some_other_work(i, delay));
    }

    pool.wait();
    pool.stop();

    return 0;
}

example output:

starting work item  0
starting some other work  0
starting work item  1
starting some other work  1
starting work item  2
starting some other work  2
starting work item  3
starting some other work  3
uh oh!  0
ending some other work  1
ending some other work  2
ending some other work  3
starting work item  4
uh oh thrown in 0 -1
starting some other work  4
starting work item  5
ending some other work  4
starting some other work  5
starting work item  6
ending some other work  5
starting some other work  6
ending some other work  6
starting work item  7
ending work item  0
starting some other work  7
uh oh!  7
uh oh thrown in 7 -1
starting work item  8
ending work item  1
starting some other work  8
ending some other work  8
starting work item  9
ending work item  5
starting some other work  9
ending some other work  9
starting work item  10
ending work item  7
starting some other work  10
ending some other work  10
starting work item  11
ending work item  4
starting some other work  11
ending some other work  11
starting work item  12
ending work item  3
starting some other work  12
ending some other work  12
starting work item  13
ending work item  10
ending work item  6
starting some other work  13
starting work item  14
ending some other work  13
starting some other work  14
uh oh!  14
uh oh thrown in 14 -1
...

Thanks for the answer Richard. Only question in this scenario, is should my thread_pool encapsulate and control my thread worker objects? You use a function as the thread, whereas I want to encapsulate an object around the thread work. — Ælex, Nov 20 '15 at 17:31
Alex the worker != the task. The worker awaits a task and the task runs on the worker. Encapsulation is the same (Same goes in my answer that uses a non-asio implementation) — sehe, Nov 20 '15 at 17:32
@Alex in my example, the worker_pool object is the thread pool, to which you can submit any asynchronous task via `submit()`. An asynchronous task can be anything that's callable with no arguments, so may be an object provided it supports `operator()` — Richard Hodges, Nov 20 '15 at 17:34
@RichardHodges This is what I don't understand: I replaced your emit function (I don't have c++14 and CUDA 6.5 won't work with it) with printf, and then printed the `this` of `some_other_work` and some are the same, some change. How are they managed? Do I get 8 different ones? This would require I take a different approach if I can figure it out, but I can make it work. I assume it is the loop constructor in `start`? — Ælex, Nov 20 '15 at 17:55
in my case each one is a distinct object, which is normally what you want. If you want to share the implementations of a worker amongst multiple workers you would simply refactor `some_other_work` to be a handle containing a shared pimpl. — Richard Hodges, Nov 20 '15 at 17:57
Yeah it seems I have to take a different approach, because copies is not what I want. Many thanks though this answers my question! — Ælex, Nov 20 '15 at 17:59
it's probably worth mentioning that the fewer objects you share between threads (including worker objects) the better. With multi-threading, copying data is usually much quicker than sharing data across threads, because the memory fence associated with a mutex is *extremely* expensive. — Richard Hodges, Nov 20 '15 at 18:00
In this case I am sharing CUDA device vectors across workers, some will have to be copied per worker, but some (I am guessing) would be better accessed as shared. What I am trying to avoid is copying them, because it is expensive, and most operations do not change their contents. However your answer pointed me towards the right directions, many thanks! — Ælex, Nov 20 '15 at 18:05

boost thread worker object re-use after thread has finished

1 Answers1

Linked