I have this fragment of code that calculates the force experienced by each unique pair of particles due to a potential form. I am trying to parallelize the nested for loops, however, the output that is expected is different from the correct output. The code is :
for(ic = 0; ic < (N_TOTAL-1); ic++)
{
for (jc = ic+1; jc < N_TOTAL; jc++)
{
rc_x = X[ic]-X[jc];
rc_z = Z[ic]-Z[jc];
/* Minimum image convention */
if (fabs(rc_x) > LxScaled/2.0)
{
rc_x = rc_x - SignR (LxScaled, rc_x);
}
if (fabs(rc_z) > LzScaled/2.0)
{
rc_z = rc_z - SignR (LzScaled, rc_z);
}
rij = sqrt(rc_x * rc_x + rc_z * rc_z);
rr = rr + (rc_x * rc_x + rc_z * rc_z);
/* Compute Force : Yukawa potential*/
second = exp (-rij * Xi ) * ((1.0*(1.0))/rij);
third = (Xi+1.0/rij);
a = second * third;
a_x[ic] = a_x[ic] + (rc_x/rij)*a;
a_x[jc] = a_x[jc] - (rc_x/rij)*a;
a_z[ic] = a_z[ic] + (rc_z/rij)*a;
a_z[jc] = a_z[jc] - (rc_z/rij)*a;
uVal = second;
uSum = uSum + (second);
}
}
}
Here X[i] and Z[i] are the position and the a_x[i] and a_z[i] are the x and z components of the force experienced by the ith particle. I have an OpenMP version 4.0 thus I cannot use reduction clause for the 1D array a_x[] and a_z[]. Thus I intend to use #pragma omp atomic
caluse to avoid any race conditions. Thus i use this to parallelize as :
#pragma omp parallel for schedule(guided, 8) reduction(+:rr, uSum) default(shared) private(ic, jc, rc_x, rc_z, rij, uVal, second, third, a)
for(ic = 0; ic < (N_TOTAL-1); ic++)
{
for (jc = ic+1; jc < N_TOTAL; jc++)
{
rc_x = X[ic]-X[jc];
rc_z = Z[ic]-Z[jc];
/* Minimum image convention */
if (fabs(rc_x) > LxScaled/2.0)
{
#pragma omp atomic
rc_x = rc_x - SignR (LxScaled, rc_x);
}
if (fabs(rc_z) > LzScaled/2.0)
{
#pragma omp atomic
rc_z = rc_z - SignR (LzScaled, rc_z);
}
rij = sqrt(rc_x * rc_x + rc_z * rc_z);
rr = rr + (rc_x * rc_x + rc_z * rc_z);
/* Compute Force : Yukawa potential*/
second = exp (-rij * Xi ) * ((1.0*(1.0))/rij);
third = (Xi+1.0/rij);
a = second * third;
#pragma omp atomic
a_x[ic] = a_x[ic] + (rc_x/rij)*a;
#pragma omp atomic
a_x[jc] = a_x[jc] - (rc_x/rij)*a;
#pragma omp atomic
a_z[ic] = a_z[ic] + (rc_z/rij)*a;
#pragma omp atomic
a_z[jc] = a_z[jc] - (rc_z/rij)*a;
uVal = second;
uSum = uSum + (second);
}
}
}
However I am not getting the correct output which is to be expected when the fragment of code is run in series. There are some deviation over time when the same force is computed over a longer period of time.
Where is the mistake that I am making, I try to use atomic update of force to eliminate any race condition that might occue. I am new to parallelization part and this is the 1t project I am trying to parallelize, Can anyon spot the mistake and point me in the right direction.
Edit: I also tried to reduce the two nested for loops into 1 however I am unable to procude the exact output as in serial.