C++ 2011 : std::thread : simple example to parallelize a loop?

Question

C++ 2011 includes very cool new features, but I can't find a lot of example to parallelize a for-loop. So my very naive question is : how do you parallelize a simple for loop (like using "omp parallel for") with std::thread ? (I search for an example).

Thank you very much.

The more I look at OMG i think that std::thread is not meant to replace it. `std::thread` is not meant for optimising low-level CRUD operations. — d_inevitable, May 29 '12 at 01:48

score 33 · Accepted Answer · answered May 29 '12 at 09:17

std::thread is not necessarily meant to parallize loops. It is meant to be the lowlevel abstraction to build constructs like a parallel_for algorithm. If you want to parallize your loops, you should either wirte a parallel_for algorithm yourself or use existing libraires which offer task based parallism.

The following example shows how you could parallize a simple loop but on the other side also shows the disadvantages, like the missing load-balancing and the complexity for a simple loop.

  typedef std::vector<int> container;
  typedef container::iterator iter;

  container v(100, 1);

  auto worker = [] (iter begin, iter end) {
    for(auto it = begin; it != end; ++it) {
      *it *= 2;
    }
  };


  // serial
  worker(std::begin(v), std::end(v));

  std::cout << std::accumulate(std::begin(v), std::end(v), 0) << std::endl; // 200

  // parallel
  std::vector<std::thread> threads(8);
  const int grainsize = v.size() / 8;

  auto work_iter = std::begin(v);
  for(auto it = std::begin(threads); it != std::end(threads) - 1; ++it) {
    *it = std::thread(worker, work_iter, work_iter + grainsize);
    work_iter += grainsize;
  }
  threads.back() = std::thread(worker, work_iter, std::end(v));

  for(auto&& i : threads) {
    i.join();
  }

  std::cout << std::accumulate(std::begin(v), std::end(v), 0) << std::endl; // 400

Using a library which offers a parallel_for template, it can be simplified to

parallel_for(std::begin(v), std::end(v), worker);

score 4 · Answer 2 · edited Oct 14 '20 at 13:05

Well obviously it depends on what your loop does, how you choose to parallellize, and how you manage the threads lifetime.

I'm reading the book from the std C++11 threading library (that is also one of the boost.thread maintainer and wrote Just Thread ) and I can see that "it depends".

Now to give you an idea of basics using the new standard threading, I would recommend to read the book as it gives plenty of examples. Also, take a look at http://www.justsoftwaresolutions.co.uk/threading/ and https://stackoverflow.com/questions/415994/boost-thread-tutorials

paxdiablo · Answer 3 · 2012-05-29T02:08:39.830

Can't provide a C++11 specific answer since we're still mostly using pthreads. But, as a language-agnostic answer, you parallelise something by setting it up to run in a separate function (the thread function).

In other words, you have a function like:

def processArraySegment (threadData):
    arrayAddr = threadData->arrayAddr
    startIdx  = threadData->startIdx
    endIdx    = threadData->endIdx

    for i = startIdx to endIdx:
        doSomethingWith (arrayAddr[i])

    exitThread()

and, in your main code, you can process the array in two chunks:

int xyzzy[100]

threadData->arrayAddr = xyzzy
threadData->startIdx  = 0
threadData->endIdx    = 49
threadData->done      = false
tid1 = startThread (processArraySegment, threadData)

// caveat coder: see below.
threadData->arrayAddr = xyzzy
threadData->startIdx  = 50
threadData->endIdx    = 99
threadData->done      = false
tid2 = startThread (processArraySegment, threadData)

waitForThreadExit (tid1)
waitForThreadExit (tid2)

(keeping in mind the caveat that you should ensure thread 1 has loaded the data into its local storage before the main thread starts modifying it for thread 2, possibly with a mutex or by using an array of structures, one per thread).

In other words, it's rarely a simple matter of just modifying a for loop so that it runs in parallel, though that would be nice, something like:

for {threads=10} ({i} = 0; {i} < ARR_SZ; {i}++)
    array[{i}] = array[{i}] + 1;

Instead, it requires a bit of rearranging your code to take advantage of threads.

And, of course, you have to ensure that it makes sense for the data to be processed in parallel. If you're setting each array element to the previous one plus 1, no amount of parallel processing will help, simply because you have to wait for the previous element to be modified first.

This particular example above simply uses an argument passed to the thread function to specify which part of the array it should process. The thread function itself contains the loop to do the work.

score 2 · Answer 4 · edited Oct 14 '20 at 12:57

2

Using this class you can do it as:

Range based loop (read and write)
pforeach(auto &val, container) { 
  val = sin(val); 
};

Index based for-loop
auto new_container = container;
pfor(size_t i, 0, container.size()) { 
  new_container[i] = sin(container[i]); 
};

edited Oct 14 '20 at 12:57

VoteCoffee

4,692
1
41
44

answered Aug 06 '13 at 07:49

Viktor Sehr

12,825
5
58
90

this class is not available (404). Could you please elaborate what class to use? – BernhardWebstudio Nov 22 '17 at 16:17

huseyin tugrul buyukisik · Answer 5 · 2020-10-14T17:54:22.443

Define macro using std::thread and lambda expression:

#ifndef PARALLEL_FOR
#define PARALLEL_FOR(INT_LOOP_BEGIN_INCLUSIVE, INT_LOOP_END_EXCLUSIVE,I,O)          \                                                               \
    {                                                                               \
        int LOOP_LIMIT=INT_LOOP_END_EXCLUSIVE-INT_LOOP_BEGIN_INCLUSIVE;             \
        std::thread threads[LOOP_LIMIT]; auto fParallelLoop=[&](int I){ O; };       \
        for(int i=0; i<LOOP_LIMIT; i++)                                             \
        {                                                                           \
            threads[i]=std::thread(fParallelLoop,i+INT_LOOP_BEGIN_INCLUSIVE);       \
        }                                                                           \
        for(int i=0; i<LOOP_LIMIT; i++)                                             \
        {                                                                           \
            threads[i].join();                                                      \
        }                                                                           \
    }                                                                               \
#endif

usage:

int aaa=0; // std::atomic<int> aaa;
PARALLEL_FOR(0,90,i,
{
    aaa+=i;
});

its ugly but it works (I mean, the multi-threading part, not the non-atomic incrementing).

Jean-Michaël Celerier · Answer 6 · 2013-08-06T06:10:02.383

AFAIK the simplest way to parallelize a loop, if you are sure that there are no concurrent access possible, is by using OpenMP.

It is supported by all major compilers except LLVM (as of August 2013).

Example :

for(int i = 0; i < n; ++i)
{
   tab[i] *= 2;
   tab2[i] /= 2;
   tab3[i] += tab[i] - tab2[i];
}

This would be parallelized very easily like this :

#pragma omp parallel for
for(int i = 0; i < n; ++i)
{
   tab[i] *= 2;
   tab2[i] /= 2;
   tab3[i] += tab[i] - tab2[i];
}

However, be aware that this is only efficient with a big number of values.

If you use g++, another very C++11-ish way of doing would be using a lambda and a for_each, and use gnu parallel extensions (which can use OpenMP behind the scene) :

__gnu_parallel::for_each(std::begin(tab), std::end(tab), [&] () 
{
    stuff_of_your_loop();
});

However, for_each is mainly thought for arrays, vectors, etc... But you can "cheat" it if you only want to iterate through a range by creating a Range class with begin and end method which will mostly increment an int.

Note that for simple loops that do mathematical stuff, the algorithms in #include <numeric> and #include <algorithm> can all be parallelized with G++.

C++ 2011 : std::thread : simple example to parallelize a loop?

6 Answers6

Linked

Related