suppose you have a parallel-for loop implementation e.g. ConcRT parallel_for, is it allways best to put all work inside one for loop body?
Take the following example:
for(size_t i = 0; i < size(); ++i)
{
DoSomething(a[i], b[i]);
}
for(size_t i = 0; i < size(); ++i)
{
DoSomethingElse(a[i], b[i]);
}
compared with
for(size_t i = 0; i < size(); ++i)
{
DoSomething(a[i], b[i]);
DoSomethingElse(a[i], b[i]);
}
the second variant would be the obvious way to go, but when it comes to parallel processing there might be other considerations?
I just had the case option 1 was faster than the second (~30ms to ~38ms on average) with parallel_for's. But I'm not good in the matter of benchmarking parallel algorithms, so maybe I measured wrong. Anyway, unfortunately I can not post the actual code example for this observation.
Are there some rules of thumb, additional considerations or just try and benchmark?