I have a nested loop: (L and A are fully defined inputs)
#pragma omp parallel for schedule(guided) shared(L,A) \
reduction(+:dummy)
for (i=k+1;i<row;i++){
for (n=0;n<k;n++){
#pragma omp atomic
dummy += L[i][n]*L[k][n];
L[i][k] = (A[i][k] - dummy)/L[k][k];
}
dummy = 0;
}
And its sequential version:
for (i=k+1;i<row;i++){
for (n=0;n<k;n++){
dummy += L[i][n]*L[k][n];
L[i][k] = (A[i][k] - dummy)/L[k][k];
}
dummy = 0;
}
They both give different results. And parallel version is much slower than the sequential version.
What may cause the problem?
Edit:
To get rid of the problems caused by the atomic directive, I modified the code as follows:
#pragma omp parallel for schedule(guided) shared(L,A) \
private(i)
for (i=k+1;i<row;i++){
double dummyy = 0;
for (n=0;n<k;n++){
dummyy += L[i][n]*L[k][n];
L[i][k] = (A[i][k] - dummyy)/L[k][k];
}
}
But it also didn't work out the problem. Results are still different.