parallel_for ppl.h not faster than sequential c++

Question

I have code like this (pseudocode, because I can't show my program):

concurrent_vector a, b, c;
concurrent_unordered_map mapForResult;

for(i=0; i<sequenceCount; i++){
  variables temp_a, temp_b, temp_c;
  database->read(&a, &b, &c);
}

parallel_for(0, sequenceCount, [](int i){
  var aa = a[i];
  var bb = b[i];
  var cc = c[i];

  resultOfFunction = MakeFunction(aa, bb, cc);

  mapForResults.insert(resultOfFunction);
}, static_partitioner());

It's working, but it's much slower than sequential version. Any ideas why? It's my first time with ppl.h, so I don't know all tips&tricks.

How does the cache behave between the two loops? Do you get a lot of cache misses in the parallel version? — NathanOliver, Oct 03 '16 at 14:45
Well if you are on Linux you can use [perf](http://stackoverflow.com/a/10114325/4342498). — NathanOliver, Oct 03 '16 at 14:54
[This](http://stackoverflow.com/questions/10122520/profiling-cpu-cache-memory-from-the-os-application) might help then. — NathanOliver, Oct 03 '16 at 15:02

score 0 · Answer 1 · answered Oct 03 '16 at 14:53

0

Every parallel version of a program takes more instructions than a single-threaded version. There's just an unavoidable overhead in setting up the threading and managing access to shared data. This is usually a limited overhead, and the extra instructions do not translate into extra time when you have sufficient cores available. Simply said, 10% overhead is not a big deal if you have 300% extra cores.

In this case, MakeFunction might be very small. That would mean you do have a lot of overhead, and your extra cores are spending all of their times fighting over mapForResults.

answered Oct 03 '16 at 14:53

MSalters

173,980
10
155
350

Unfortunately MakeFunction is big. When I'm using it on small database (about 30 sequences), it's twice faster than sequential. But when I'm trying to use it on database about 10000 sequences, it's slooooower. – Queen Oct 03 '16 at 14:58
And one more question: if I am using concurrent_vector or concurrent_unordered_map fighting can be problem, too? I've chosen it instead of Critical_Section because I thought it's not fighting then. – Queen Oct 03 '16 at 15:00
"MakeFunction is big" is a little confusing - but then you follow up with "database". In adding the word database you changed your question. Perhaps you can add more information about what this "MakeFunction" is and does. – UKMonkey Oct 03 '16 at 15:03
Well, it's actually fortunate if (**if**) MakeFunction takes a lot of time. That means the different CPU cores are busy doing useful work in parallel. But the database code was in the code _before_ `parallel_for`, right? – MSalters Oct 03 '16 at 15:04
Right. Everything about database is in other class, made by profesionalist. I've got ready, working sequential code from profesional programmist to try parallel it - it's exercise for me. Function "MakeFunction" is recursive building index function and for small database example paralleling is faster. But for big database is slower. I haven't idea why. – Queen Oct 03 '16 at 15:13
Ok, I found where is problem - it's fighting for mapForResult. I was debugging without it and it was working fast. But I haven't idea how to add results of every iteration of parallel_for to one map? – Queen Oct 03 '16 at 15:54
1

@Queen: Why choose an unordered map (aka hashtable) as an output structure? Its O(1) lookup must be maintained on every insert, and that's not free. Using an ordinary vector of size `sequenceCount` might be easier. It doesn't need to be concurrent, because the container doesn't need to change size and every element is accessed once by a single thread. – MSalters Oct 03 '16 at 17:28
Okey @MSalters, but how to put a pair into vector? Can I do that? – Queen Oct 03 '16 at 17:29
1

Sure: `std::vector> example(sequenceCount);` – MSalters Oct 03 '16 at 17:30
Okey, I see how stupid was my question @MSalters :) I try and I will be back with results, soon. – Queen Oct 03 '16 at 17:35
@MSalters it works! I have 50% faster code! :D But I don't understand why vector is better than unordered_map, I have to looking for more information about it. Thank you for help! – Queen Oct 03 '16 at 17:54

parallel_for ppl.h not faster than sequential c++

1 Answers1