3

I want to run two absolutely different functions simultaneously. I tried this to check if it works:

#pragma omp parallel
{
    #pragma omp single nowait
    {
    #pragma omp task
    {
        for (unsigned long long i=0; i<numSteps; i++)
            sum1 = sum1 + 4.0/(1.+ (i + .5)*step*(i + .5)*step);
    }
    #pragma omp task
    {
        for (unsigned long long i=0; i<numSteps; i++)
            sum2 = sum2 + 4.0/(1.+ (i + .5)*step*(i + .5)*step);
    }
    #pragma omp taskwait
    }
}

With only one cycle it ends in 10s and loads my cpu over 100%.

With 2 cycles that I want to run simultaneously on two different cores it ends in 24s and loads cpu over 200% but I expected near 10s.

Without #pragma omp single nowait it ends calculations in 138s and loads my cpu over 400%.

What am I doing wrong?

Groundexp
  • 39
  • 3
  • What do you mean by cycle? – Z boson Apr 20 '16 at 08:31
  • @Zboson I mean only one "for()", not two of them. – Groundexp Apr 20 '16 at 10:13
  • @CraigYoung the main problem is that without "single" nothing changes. time is near 22s. And with "single" it loads not 1 core, it loads 2 cores. – Groundexp Apr 20 '16 at 10:15
  • How do you test one cycle? You mean you use task once? What your compiler options? Do you use `-O3`? – Z boson Apr 20 '16 at 10:25
  • Your `taskwait` is misplaced. It should be **after** the `single` construct, not inside it. Also, how do you measure the elapsed time? – Hristo Iliev Apr 20 '16 at 10:29
  • @Zboson I just comment the other task so there is only one for-loop. I compile with -O0. – Groundexp Apr 20 '16 at 10:43
  • @HristoIliev I've just edited a code as you said but is still loads 200% and calculate 24s. No difference. I measure time with time(NULL) from time.h – Groundexp Apr 20 '16 at 10:46
  • @CraigYoung There is a misunderstanding. With "single" it uses 1 thread for 1 task, and there are 2 tasks so it loads cpu for 200%. Thats exactly what I need, but I don't get the same time as if there was only 1 task and 1 core. Again, with or without single there is no advance. And I have changed all variables, so there is no mutual variables, time is still the same. – Groundexp Apr 20 '16 at 11:20
  • @Groundexp Okay then, sorry I couldn't be more help. I'll delete my other comments to avoid leading others astray. (I suggest you add the declarations to your code sample though. It might be relevant to someone more familiar with OMP.) – Disillusioned Apr 20 '16 at 11:28
  • 3
    @Groundexp, it's most likely because your `sum1` and `sum2` variables are next to each other in memory and end up in the same cache line, therefore a cache ping-pong ensues, formally known as _false sharing_. Check if that's the case by modifying the first task construct to read `#pragma omp task firstprivate(sum1)` and the second one to read `#pragma omp task firstprivate(sum2)`. – Hristo Iliev Apr 20 '16 at 12:13
  • 2
    I think it's pretty pointless to benchmark without optimization. You should use `-O3`, make sure `sum1` and `sum2` are private inside the loop but somehow you need to use them after the loop so they are not optimized away. – Z boson Apr 20 '16 at 13:14
  • Seconding @Zboson. Never benchmark performance unless you're compiling with `-O3`. – NoseKnowsAll Apr 20 '16 at 15:48

0 Answers0