Boost Thread_Group in a loop is very slow

Question

I wanted to use threading to run check multiple images in a vector at the same time. Here is the code

        boost::thread_group tGroup;
        for (int line = 0;line < sourceImageData.size(); line++) {
            for (int pixel = 0;pixel < sourceImageData[line].size();pixel++) {
                for (int im = 0;im < m_images.size();im++) {
                    tGroup.create_thread(boost::bind(&ClassX::ClassXFunction, this, line, pixel, im));
                }
                tGroup.join_all();
            }
        }

This creates the thread group and loops thru lines of pixel data and each pixel and then multiple images. Its a weird project but anyway I bind the thread to a method in the same instance of the class this code is in so "this" is used. This runs through a population of about 20 images, binding each thread as it goes and then when it is done looping the join_all function takes effect when the threads are done. Then it goes to the next pixel and starts over again.

I'v tested running 50 threads at the same time with this simple program

void run(int index) {
    for (int i = 0;i < 100;i++) {
        std::cout << "Index : " <<index<<"   "<<i << std::endl;
    }
}

int main() {
    boost::thread_group tGroup;

    for (int i = 0;i < 50;i++){
        tGroup.create_thread(boost::bind(run, i));
    }

    tGroup.join_all();
    int done;
    std::cin >> done;
    return 0;
}

This works very quickly. Even though the method the threads are bound to in the previous program is more complicated it shouldn't be as slow as it is. It takes like 4 seconds for one loop of sourceImageData (line) to complete. I'm new to boost threading so I don't know if something is blatantly wrong with the nested loops or otherwise. Any insight is appreciated.

score 0 · Answer 1 · answered Feb 26 '16 at 03:36

I believe the difference here is in when you decide to join the threads.

In the first piece of code, you join the threads at every pixel of the supposed source image. In the second piece of code, you only join the threads once at the very end.

Thread synchronization is expensive and often a bottleneck for parallel programs because you are basically pausing execution of any new threads until ALL threads that need to be synchronized, which in this case is all the threads that are active, are done running.

If the iterations of the innermost loop(the one with im) are not dependent on each other, I would suggest you join the threads after the entire outermost loop is done.

Thanks, that helped speed it up a bit – Liger Feb 26 '16 at 21:57 — Liger, Feb 26 '16 at 21:57

score 0 · Accepted Answer · edited May 23 '17 at 12:07

0

The answer is simple. Don't start that many threads. Consider starting as many threads as you have logical CPU cores. Starting threads is very expensive.

Certainly never start a thread just to do one tiny job. Keep the threads and give them lots of (small) tasks using a task queue.

See here for a good example where the number of threads was similarly the issue: boost thread throwing exception "thread_resource_error: resource temporarily unavailable"

In this case I'd think you can gain a lot of performance by increasing the size of each task (don't create one per pixel, but per scan-line for example)

edited May 23 '17 at 12:07

Community

1
1

answered Feb 26 '16 at 11:20

sehe

374,641
47
450
633

Thanks, that helped speed it up a lot. I made a thread under the first/main loop (line) like you said and moved the other two into the function that is called by the thread. Is there anything else that can be done to make it faster, such as cleanup, or making it so only like 8 or 10 threads are active at a time and just calling the function normally until a 'slot' opens up? – Liger Feb 26 '16 at 21:56
The way to ensure only _n_ threads are active is by not having more of them. So it's more about chunking up your processing across the available threads smartly. You should probably focus on not having tasks too small (too much overhead) and not too large (too little concurrency) – sehe Feb 26 '16 at 21:59

Boost Thread_Group in a loop is very slow

2 Answers2