I read here that sequential memory consistency (seq_cst
) "might be needed" to make sure an atomic update is viewed by all threads consistently in an OpenMP parallel region.
Consider the following MWE, which is admittedly trivial and could be realized with a reduction rather than atomics, but which illustrates my question that arose in a more complex piece of code:
#include <iostream>
int main()
{
double a = 0;
#pragma omp parallel for
for (int i = 0; i < 10000000; ++i)
{
#pragma omp atomic
a += 5.5;
}
std::cout.precision(17);
std::cout << a << std::endl;
return 0;
}
I compiled this with g++ -fopenmp -O3
using GCC versions 6 to 12 on an Intel Core i9-9880H CPU, and then ran it using 4 or 8 threads, which always correctly prints:
55000000
When adding seq_cst
to the atomic directive, the result is exactly the same. I would have expected the code without seq_cst
to (occasionally) produce smaller results due to race conditions / outdated memory view. Is this hardware dependent? Is the code guaranteed to be free of race conditions even without seq_cst
, and if so, why? Would the answer be different when using a compiler that was still based on OpenMP 3.1, as that apparently worked somewhat differently?