Yes, both.
The C++ memory model requires that atomic operations follow certain semantics, which depend on the specified memory ordering parameter. So the compiler has to emit code which, when executed, behaves according to those semantics.
For example, taking code like:
std::atomic<int> x;
int y, tmp;
if (x.load(std::memory_order_acquire) == 5) {
tmp = y;
}
On a typical machine, the compiler would need to:
Not reorder the loads of x
and y
at compile time. In other words, it should emit a load instruction of x
and a load instruction of y
, such that the first is executed before the second in program order.
Ensure that the loads of x
and y
become visible in that order. If the machine is capable of out-of-order execution, speculative loads, or any other feature that could cause two loads to become visible out of program order, then the compiler must emit code that prevents it from happening in this instance.
What that code looks like, depends on the machine in question. Possibilities include:
Nothing special is needed, because the machine doesn't do this particular kind of reordering. So x
and y
will just be loaded by ordinary load instructions, with nothing extra. This is the case on x86, for instance, where "all loads are acquire".
Using a special form of the load instruction which inhibits reordering. For instance, on AArch64, the load of x
would be done with the ldapr
or ldar
instruction instead of the ordinary ldr
.
Inserting a special memory barrier instruction between the two loads, like ARM's dmb
.
In the vast majority of code, the memory ordering parameter is specified as a compile-time constant, because the programmer knows statically what ordering is required, and so the compiler can emit the instructions appropriate to that particular ordering.
In the unusual case where the ordering parameter is not a constant, then the compiler has to emit code that will behave properly no matter what value is specified. Usually what's done is that the compiler just treats the ordering parameter as being memory_order_seq_cst
, since that is stronger than all the others: a seq_cst
operation satisfies all the semantics required by the weaker orderings (and more besides). This saves the cost of actually testing the value of the ordering parameter at runtime and branching accordingly, which likely outweighs the potential savings of doing the operation with a weaker ordering.
But if the compiler did choose to test and branch, it would typically have to assume "worst case" for the purposes of optimizing surrounding code. For instance, on AArch64, for x.load(order)
it might emit a chunk of code like the following:
int t;
if (order == std::memory_order_relaxed)
LDR t, [x]
else if (order == std::memory_order_acquire)
LDAPR t, [x]
else if (order == std::memory_order_seq_cst)
LDAR t, [x]
else
abort();
if (t == 5)
LDR tmp, [y]
However, it would need to ensure that the load of y
remained at the end of this chunk of code (in program order). If order
were equal to std::memory_order_relaxed
, then it would be okay to execute the load of y
before the load of x
, but not if it were std::memory_order_acquire
or stronger.
On the other hand, it could conceivably emit
int t, t2;
if (order == std::memory_order_relaxed) {
LDR t2, [y]
LDR t, [x]
} else if (order == std::memory_order_acquire) {
LDAPR t, [x]
LDR t2, [y]
} else if (order == std::memory_order_seq_cst) {
LDAR t, [x]
LDR t2, [y]
else
abort();
if (t == 5)
tmp = t2;
but we are now well outside the range of transformations that a real-world compiler would actually perform.