Parallelize a loop using std::thread and good practices

Question

Possible Duplicate:
C++ 2011 : std::thread : simple example to parallelize a loop?

Consider the following program that distribute a computation over the elements of a vector (I never used std::thread before):

// vectorop.cpp
// compilation: g++ -O3 -std=c++0x vectorop.cpp -o vectorop -lpthread
// execution: time ./vectorop 100 50000000 
// (100: number of threads, 50000000: vector size)
#include <iostream>
#include <iomanip>
#include <cstdio>
#include <vector>
#include <thread>
#include <cmath>
#include <algorithm>
#include <numeric>

// Some calculation that takes some time
template<typename T> 
void f(std::vector<T>& v, unsigned int first, unsigned int last) {
    for (unsigned int i = first; i < last; ++i) {
        v[i] = std::sin(v[i])+std::exp(std::cos(v[i]))/std::exp(std::sin(v[i])); 
    }
}

// Main
int main(int argc, char* argv[]) {

    // Variables
    const int nthreads = (argc > 1) ? std::atol(argv[1]) : (1);
    const int n = (argc > 2) ? std::atol(argv[2]) : (100000000);
    double x = 0;
    std::vector<std::thread> t;
    std::vector<double> v(n);

    // Initialization
    std::iota(v.begin(), v.end(), 0);

    // Start threads
    for (unsigned int i = 0; i < n; i += std::max(1, n/nthreads)) {
        // question 1: 
        // how to compute the first/last indexes attributed to each thread 
        // with a more "elegant" formula ?
        std::cout<<i<<" "<<std::min(i+std::max(1, n/nthreads), v.size())<<std::endl;
        t.push_back(std::thread(f<double>, std::ref(v), i, std::min(i+std::max(1, n/nthreads), v.size())));
    }

    // Finish threads
    for (unsigned int i = 0; i < t.size(); ++i) {
        t[i].join();
    }
    // question 2: 
    // how to be sure that all threads are finished here ?
    // how to "wait" for the end of all threads ?

    // Finalization
    for (unsigned int i = 0; i < n; ++i) {
        x += v[i];
    }
    std::cout<<std::setprecision(15)<<x<<std::endl;
    return 0;
}

There is already two questions embedded in the code.

A third one would be: is this code is completely ok or could it be written in a more elegant way using std::threads ? I do not know the "good practices" using std::thread...

"question 2" is answered by the comment immediately preceding it. — Seth Carnegie, Dec 26 '12 at 16:49
the comment is from me, so I do not know if a join loop finish all threads before going to the next instruction. — Vincent, Dec 26 '12 at 16:55
For elegance, you probably want to use an `std::future` instead of using threads directly at all. — Jerry Coffin, Dec 26 '12 at 16:55
@JerryCoffin: Could you provide an example of a code doing the same thing in the most elegant way you have in mind ? — Vincent, Dec 26 '12 at 17:00
Here, suit yourself: http://parlab.eecs.berkeley.edu/wiki/_media/patterns/paraplop_g1_3.pdf — Adri C.S., Dec 26 '12 at 17:05
See my answer here: http://stackoverflow.com/a/10796261/893693 — Stephan Dollberg, Dec 26 '12 at 22:08

score 0 · Accepted Answer · edited May 23 '17 at 10:28

On the first question, how to compute the ranges to compute for each thread: I extracted constants and gave them names, in order to make the code easier to read. For good practices I also used a lambda which makes the code easier to modify - code in the lambda will only ever be used here, while the function f can be used from other code throughout the program. Make use of this to put shared parts of the code in a function and specialized that are only ever used once in the lambda.

const size_t itemsPerThread = std::max(1, n/threads);
for (size_t nextIndex= 0; nextIndex< v.size(); nextIndex+= itemsPerThread)
{
    const size_t beginIndex = nextIndex;
    const size_t endIndex =std::min(nextIndex+itemsPerThread, v.size())
    std::cout << beginIndex << " " << endIndex << std::endl;
    t.push_back(std::thread([&v,beginIndex ,endItem]{f(v,beginIndex,endIndex);});
}

An advanced use case would make use of a thread pool, but how this will look depends on your application design and is not covered by the STL. For a good example of a threading model see the Qt Framework. If you're just getting started with threads save this for later.

The second question was already answered in the comments. The std::thread::join function will wait(block) until the thread has finished. By calling the join function on each thread and reaching the code after the join function, you can be sure that all there threads have finished and can now be deleted.

`std::iota` and the fill constructor do very different things. they cannot be interchanged. — Kyle Lutz, Dec 26 '12 at 17:41
Miss-read `iota` as `itoa`. removed the part referring to it's use. — Peter, Dec 26 '12 at 19:50

Parallelize a loop using std::thread and good practices

1 Answers1