3

I read here that sequential memory consistency (seq_cst) "might be needed" to make sure an atomic update is viewed by all threads consistently in an OpenMP parallel region.

Consider the following MWE, which is admittedly trivial and could be realized with a reduction rather than atomics, but which illustrates my question that arose in a more complex piece of code:

#include <iostream>
int main()
{
  double a = 0;
#pragma omp parallel for
  for (int i = 0; i < 10000000; ++i)
  {
#pragma omp atomic
    a += 5.5;
  }
  std::cout.precision(17);
  std::cout << a << std::endl;
  return 0;
}

I compiled this with g++ -fopenmp -O3 using GCC versions 6 to 12 on an Intel Core i9-9880H CPU, and then ran it using 4 or 8 threads, which always correctly prints:

55000000

When adding seq_cst to the atomic directive, the result is exactly the same. I would have expected the code without seq_cst to (occasionally) produce smaller results due to race conditions / outdated memory view. Is this hardware dependent? Is the code guaranteed to be free of race conditions even without seq_cst, and if so, why? Would the answer be different when using a compiler that was still based on OpenMP 3.1, as that apparently worked somewhat differently?

Roman
  • 95
  • 3
  • In this case, the relaxed memory model (default one) is completely fine. The relaxed atomic operations ensure that these operations are atomic (so data race cannot occur), but it doesn’t impose any ordering constraints/memory barriers (or OpenMP's flush operation), which is not a problem here, so you will always obtain correct results. You have to familiarize yourself with C++11 memory models. As long as you are not an expert in them, I suggest using `seq_cst` (sequentially consistent model). – Laci Feb 17 '23 at 14:22
  • Thanks for your reply, @Laci. Would you mind elaborating why it is not a problem here? What would need to be different to make `seq_cst` needed? I'm asking because the real code I'm working with is more involved than the MWE above, and I'm concerned about compatibility with OpenMP versions prior to 4.0, which is the one that introduced `seq_cst`, for reasons that are outside of my control. Ideally, the program should compile and run fine using just OpenMP 3.1. – Roman Feb 17 '23 at 17:04
  • 1
    Well, memory consistency models are not an easy topic and can not be explained in a comment. Do not worry, it will run fine on using OpenMP 3.1 version because in OpenMP 3.1 there is an implicit flush at the entry to and exit from the atomic operation. – Laci Feb 17 '23 at 17:58
  • 1
    You aren't using the intermediate results inside the loop so ordering wrt. anything else is irrelevant. That's what `seq_cst` gives you. Just atomicity alone will give you the correct final result after the loop, avoiding data-race UB. Usually you don't need `seq_cst` for most algorithms, at most acquire / release are usually fine for sending data across threads. (https://preshing.com/20120913/acquire-and-release-semantics/ and https://preshing.com/20120612/an-introduction-to-lock-free-programming/#sequential-consistency / https://preshing.com/20120515/memory-reordering-caught-in-the-act/) – Peter Cordes Feb 17 '23 at 22:47
  • 1
    I think I understand better now, thanks for your explanations and links. Basically, there's only one memory location being atomically updated here (that of `a`), so sequential consistency is not needed. – Roman Feb 20 '23 at 12:38

0 Answers0