Parallelize a for loop with computation

Question

I am trying to optimize a C program using openMP. I am able to parallelize the simple for loop like

#pragma omp parallel
#pragma omp for
    for (i = 0; i < size; i++)
    {
        Y[i] = i * 0.3;
        Z[i] = -i * 0.4;
    }

Where X, Y and Z are float* of size "size" This works fine, but there is another loop immediately after it:

 for (i = 0; i < size; i++)
    {
        X[i] += Z[i] * Y[i] * Y[i] * 10.0;
        sum += X[i];
    }
printf("Sum =%d\n",sum);

I am not sure how to parallelize the above for the loop. I am compiling the program with the command gcc -fopenmp filename and running the executable ./a.out I hope that is enough to reflect performance improvement.

I added #pragma omp parallel for reduction(+ \ : sum) at top of the second loop, it indeed is running faster and producing correct output. Need expert input to parallelize the above and avoid false sharing? Is the above directive correct or any better alternative to parallelize and make it faster?

I'm not very familiar with openmp, but i'm pretty sure the line `sum += X[i];` causes a race condition. — S3gfault, Nov 08 '22 at 18:53
@S3gfault yeah thanks for pointing that out. I got that. Is there any way to parallelize it? — nicku, Nov 08 '22 at 18:56
So `Y[i]` is `i * 0.3` and `Z[i]` is `-i * 0.4`? Then `Z[i] * Y[i] * Y[i] * 10.0` is `-i * 0.4 * i * 0.3 * i * 0.3 * 10` = `-i*i*i*.36`? Then, after the second loop, `sum` is the sum of the initial `X[i]` (which you have not shown) plus the sum of `-i*i*i*.36`, for `i` from `0` to `size-1`. [That sum is `-0.09*(size-1)*(size-1)*size`](https://www.wolframalpha.com/input?i=sum%28-i*i*i*.36%29+for+i+from+0+to+n-1). So all you need to do to calculate `sum` is add all the initial `X[i]` and then add `-0.09*(size-1)*(size-1)*size`. All of the `X[i]` can be updated independently, adding `-i*i*i*.36`. — Eric Postpischil, Nov 08 '22 at 19:09
What is your goal? To lean OpenMP or do this simple calculation as fast as possible? — Laci, Nov 08 '22 at 19:18
No, false sharing is not a problem, but your code is memory-bound, not calculation expensive. But it is not clear what your question is. — Laci, Nov 08 '22 at 19:24
Add that to your post. It is still vague. When you run your code, specifically what is not working the way you want it to. Be specific. for one small example _"produces the wrong answer"_. Edit to put the value you expect. This really should be a [mcve]. — ryyker, Nov 08 '22 at 19:28
@Laci parallelize using openMP. That reduction directive seems to be doing it. But I am not sure about it or if there is a better way — nicku, Nov 08 '22 at 19:28
@ryyker I removed the confusing line. That was not required. I just need to parallelize the second loop — nicku, Nov 08 '22 at 19:30
"a better way" to achieve what? Determine the sum? Learn openmp? Make this code as fast as possible using OpenMP directives only? Finish your homework? — Laci, Nov 08 '22 at 19:32
@Laci Why is false sharing not a problem here? Will the `reduction` directive that I added cause `false sharing` — nicku, Nov 08 '22 at 19:35
Got that. I think it helps. But learn from the other comments what kind of ways you can provide the people your asking help from the information they will need to help. Don't leave it for people to guess what you already know, or don't know. That comes generally best if the code is presented in a [mcve], including your questions, expectations, and known hindrances to meeting your expectations. — ryyker, Nov 08 '22 at 19:39
Read this: [Is the reduction in OpenMP safe from false sharing?](https://stackoverflow.com/questions/49630440/is-the-reduction-in-openmp-safe-from-false-sharing) — Laci, Nov 08 '22 at 19:40
@nicku as far as I am concerned, your OpenMP directive causes neither false sharing nor race conditions. OpenMP generates an accumulator (`sum`) for each thread, performs the operation `X[i] += Z[i] * Y[i] * Y[i] * 10.0;` for all it's values, then globally reduces the subresults. — RawkFist, Nov 08 '22 at 19:41
@nicku do use braces after `#pragma omp parallel` including both for-loops? If not, the spawned threads will be joined after the first for loop and relaunched for the second one, which gives unneccessary overhead. — RawkFist, Nov 08 '22 at 19:47
Is the above directive correct? Yes, provided that you correct the typo (`"\"`). Any better alternative to parallelize? consider @RawkFist's comment (and combine with @Erik's one). Make it faster? Sure, the fastest is : `printf("Sum =%f\n",-0.09*(size-1)*(size-1)*size*size); ` ps: Is your `Sum` really an integer? — Laci, Nov 08 '22 at 20:47

Parallelize a for loop with computation

0 Answers0