Is it possible to convert my loop into parallel_for fnc invocation?

Question

I'm having a for loop which I would like to convert into parallel_for fnc invocation. My code meets all the criteria (described in Parallel_Programming_with_Microsoft_Visual_C_plus_plus,p.7) for this conversion to be successfull, yet I find it difficult to implement. Here is my example:

//"Oridinary" for
//numbers_from_file_ is a vector<Big_Int> loaded with Big_Int 
//results_ is a vector<Big_Int>

for (unsigned i = 0; i < numbers_from_file_.size(); i += 2)//+2 to skip to another pair
{
    results_.push_back( numbers_from_file_[i] * numbers_from_file_[i + 1]);
}

The scenario is that each pair of numbers from numbers_from_file_ is multiplied out and stored in results_. In order to make it work the variable i has to be incremented by two (to skip to another pair). Unfortunately example in this book is showing how to convert body of for loop into parallel_for fnc invocation only if i is being incremented by one.
Is it possible to convert my loop into parallel_for fnc invocation?

Is that really your code, or is that just an example? Because if that's your code, threading will _hurt_ your performance, not help it. There is overhead to spawning a thread; in general, for threading to be a win, you need to do more work in the thread than the cost of the overhead. And multiplying a couple of numbers together that are stored in a buffer isn't it. Even if those numbers are `Big_Int` classes, the time it takes to multiply them is negligable. You need to break this down into groups of numbers that you multiply together so that each thread does lots of multiplications. — Nicol Bolas, Jun 30 '11 at 08:17
@smallB - Is that the best title you could come up with for this question? — razlebe, Jun 30 '11 at 08:21
@Nicol it may be true what you're saying, and probably in this example it is true. The fact is that I'm trying to learn about those things and I don't really care much about perf. Thanks for your input though. — smallB, Jun 30 '11 at 08:22
@Nicol Bolas: If the vector is large, then it could easily be a win. A couple of numbers, then I'd agree- but for any N-sized buffer, once N becomes of a certain size, then it absolutely becomes a win. — Puppy, Jun 30 '11 at 10:15
@DeadMG: How? In order for it to be a win, the cost of each thread's overhead (call it X) divided by the number of concurrent threads (Z), must be less than the cost of each iteration of the loop (Y). Y is probably around 40 cycles, perhaps a bit more depending on what Big_Int store. X is almost certainly on the order of hundreds of cycles, even when picking from a thread pool. There are startup and shutdown costs associated with threading. It doesn't matter how big N is, unless Y is larger than the overhead of running a thread, it is a performance loss. — Nicol Bolas, Jun 30 '11 at 10:49
@Nicol Bolas +1 hey, thanks for your input. Most appreciated. You should post it as an answer so I could upvote it. Thanks again. After all I've decided to go with something portable instead of ppl from MS. It is just unacceptable for me. — smallB, Jun 30 '11 at 11:09

score 1 · Accepted Answer · answered Jun 30 '11 at 08:19

1

MSDN says that there is a step parameter. Use it.

answered Jun 30 '11 at 08:19

Yakov Galka

70,775
16
139
220

score 1 · Answer 2 · answered Jun 30 '11 at 10:14

Yes, you can. You will need to resize results first and then add to it.

results_.resize(numbers_from_file.size() / 2);
parallel_for(static_cast<std::size_t>(0), numbers_from_file_.size(), static_cast<std::size_t>(2), [&](std::size_t i) {
    results_[i / 2] = numbers_from_file_[i] * numbers_from_file_[i + 1]);
});

This is just quick, of course, no guarantees, but it should be pretty directly replacable.

score 0 · Answer 3 · answered Jun 30 '11 at 10:08

0

Your code appears to be identical to

for (unsigned j = 0; j < numbers_from_file_.size()/2; ++j)
{
    results_.push_back( numbers_from_file_[2*j] * numbers_from_file_[2*j + 1]);
}

answered Jun 30 '11 at 10:08

MSalters

173,980
10
155
350

it may appear to be identicall to what you've produced but doesn't do any divisions nor so many multiplications ;) – smallB Jun 30 '11 at 10:18
@smallB: those are free. You need a multiplication by `sizeof(numbers_from_file::value_type)` anyway, at hardware level. And the division is a one-off bitshift. – MSalters Jun 30 '11 at 10:49
that is true what you're saying about bit shifting, yet they are not really free (there is some cost of bit shifting, doesn't matter how small). Second issue is that your code is definitelly less obvious to what I've got (but that arguable and I wouldn't like to argue ;).Anyway, after all this I've decided that I won't use ppl.h for the simple fact that it would bind me to VS and I'm just not prepare to accept that. There are other options so why not use them? But thanks for your answer. – smallB Jun 30 '11 at 11:06
@smallB: Instructions can truely be free, as in zero overhead. This happens especially when they're near other instructions that are independent and take a significant amount of time, such as a memory access. That's the case here; you'll be reading `results[0]` directly afterwards. Since that memory access doesn't depend on the division, it can start early, and the division can execute in parallel with the read. – MSalters Jun 30 '11 at 11:26
fair enough, thank you for your input, but as I've said: ppl? no thank you, I write my code to be portable. ;) – smallB Jun 30 '11 at 11:31

Is it possible to convert my loop into parallel_for fnc invocation?

3 Answers3

Linked