0

I am trying to optimize a C program using openMP. I am able to parallelize the simple for loop like

#pragma omp parallel
#pragma omp for
    for (i = 0; i < size; i++)
    {
        Y[i] = i * 0.3;
        Z[i] = -i * 0.4;
    }  

Where X, Y and Z are float* of size "size" This works fine, but there is another loop immediately after it:

 for (i = 0; i < size; i++)
    {
        X[i] += Z[i] * Y[i] * Y[i] * 10.0;
        sum += X[i];
    }
printf("Sum =%d\n",sum);  

I am not sure how to parallelize the above for the loop. I am compiling the program with the command gcc -fopenmp filename and running the executable ./a.out I hope that is enough to reflect performance improvement.

I added #pragma omp parallel for reduction(+ \ : sum) at top of the second loop, it indeed is running faster and producing correct output. Need expert input to parallelize the above and avoid false sharing? Is the above directive correct or any better alternative to parallelize and make it faster?

nicku
  • 279
  • 2
  • 6
  • 1
    I'm not very familiar with openmp, but i'm pretty sure the line `sum += X[i];` causes a race condition. – S3gfault Nov 08 '22 at 18:53
  • @S3gfault yeah thanks for pointing that out. I got that. Is there any way to parallelize it? – nicku Nov 08 '22 at 18:56
  • Sum all `X[i]` outside, in the separate loop? – Nelfeal Nov 08 '22 at 19:07
  • 1
    So `Y[i]` is `i * 0.3` and `Z[i]` is `-i * 0.4`? Then `Z[i] * Y[i] * Y[i] * 10.0` is `-i * 0.4 * i * 0.3 * i * 0.3 * 10` = `-i*i*i*.36`? Then, after the second loop, `sum` is the sum of the initial `X[i]` (which you have not shown) plus the sum of `-i*i*i*.36`, for `i` from `0` to `size-1`. [That sum is `-0.09*(size-1)*(size-1)*size`](https://www.wolframalpha.com/input?i=sum%28-i*i*i*.36%29+for+i+from+0+to+n-1). So all you need to do to calculate `sum` is add all the initial `X[i]` and then add `-0.09*(size-1)*(size-1)*size`. All of the `X[i]` can be updated independently, adding `-i*i*i*.36`. – Eric Postpischil Nov 08 '22 at 19:09
  • What is your goal? To lean OpenMP or do this simple calculation as fast as possible? – Laci Nov 08 '22 at 19:18
  • _"Need expert input."_. Be more specific please. – ryyker Nov 08 '22 at 19:19
  • No, false sharing is not a problem, but your code is memory-bound, not calculation expensive. But it is not clear what your question is. – Laci Nov 08 '22 at 19:24
  • @ryyker parallelization and avoid false sharing – nicku Nov 08 '22 at 19:25
  • Add that to your post. It is still vague. When you run your code, specifically what is not working the way you want it to. Be specific. for one small example _"produces the wrong answer"_. Edit to put the value you expect. This really should be a [mcve]. – ryyker Nov 08 '22 at 19:28
  • @Laci parallelize using openMP. That reduction directive seems to be doing it. But I am not sure about it or if there is a better way – nicku Nov 08 '22 at 19:28
  • 1
    @ryyker I removed the confusing line. That was not required. I just need to parallelize the second loop – nicku Nov 08 '22 at 19:30
  • 1
    "a better way" to achieve what? Determine the sum? Learn openmp? Make this code as fast as possible using OpenMP directives only? Finish your homework? – Laci Nov 08 '22 at 19:32
  • @Laci Why is false sharing not a problem here? Will the `reduction` directive that I added cause `false sharing` – nicku Nov 08 '22 at 19:35
  • Got that. I think it helps. But learn from the other comments what kind of ways you can provide the people your asking help from the information they will need to help. Don't leave it for people to guess what you already know, or don't know. That comes generally best if the code is presented in a [mcve], including your questions, expectations, and known hindrances to meeting your expectations. – ryyker Nov 08 '22 at 19:39
  • 1
    Read this: [Is the reduction in OpenMP safe from false sharing?](https://stackoverflow.com/questions/49630440/is-the-reduction-in-openmp-safe-from-false-sharing) – Laci Nov 08 '22 at 19:40
  • @nicku as far as I am concerned, your OpenMP directive causes neither false sharing nor race conditions. OpenMP generates an accumulator (`sum`) for each thread, performs the operation `X[i] += Z[i] * Y[i] * Y[i] * 10.0;` for all it's values, then globally reduces the subresults. – RawkFist Nov 08 '22 at 19:41
  • @nicku do use braces after `#pragma omp parallel` including both for-loops? If not, the spawned threads will be joined after the first for loop and relaunched for the second one, which gives unneccessary overhead. – RawkFist Nov 08 '22 at 19:47
  • 1
    Is the above directive correct? Yes, provided that you correct the typo (`"\"`). Any better alternative to parallelize? consider @RawkFist's comment (and combine with @Erik's one). Make it faster? Sure, the fastest is : `printf("Sum =%f\n",-0.09*(size-1)*(size-1)*size*size); ` ps: Is your `Sum` really an integer? – Laci Nov 08 '22 at 20:47

0 Answers0