I am trying to optimize a C program using openMP. I am able to parallelize the simple for loop like
#pragma omp parallel
#pragma omp for
for (i = 0; i < size; i++)
{
Y[i] = i * 0.3;
Z[i] = -i * 0.4;
}
Where X, Y and Z are float* of size "size" This works fine, but there is another loop immediately after it:
for (i = 0; i < size; i++)
{
X[i] += Z[i] * Y[i] * Y[i] * 10.0;
sum += X[i];
}
printf("Sum =%d\n",sum);
I am not sure how to parallelize the above for the loop. I am compiling the program with the command gcc -fopenmp filename
and running the executable ./a.out I hope that is enough to reflect performance improvement.
I added #pragma omp parallel for reduction(+ \ : sum)
at top of the second loop, it indeed is running faster and producing correct output. Need expert input to parallelize the above and avoid false sharing? Is the above directive correct or any better alternative to parallelize and make it faster?