C++11: What prevents stores from lifting past the start of a lock's critical section?

Question

My understanding is that a spinlock can be implemented using C++11 atomics with an acquire-CAS on lock and a release-store on unlock, something like this:

class SpinLock {
 public:
  void Lock() {
    while (l_.test_and_set(std::memory_order_acquire));
  }

  void Unlock() {
    l_.clear(std::memory_order_release);
  }

 private:
  std::atomic_flag l_ = ATOMIC_FLAG_INIT;
};

Consider its use in a function that acquires a lock and then does a blind write to some shared location:

int g_some_int_;

void BlindWrite(int val) {
  static SpinLock lock_;

  lock_.Lock();
  g_some_int_ = val;
  lock_.Unlock();
}

I'm interested in how the compiler is restricted in translating this to generated assembly code.

I understand why the write to g_some_int_ can't migrate past the end of the critical section in the compiler's output -- that would mean that the write isn't guaranteed to be seen by the next thread that acquires the lock, which is guaranteed by the release/acquire ordering.

But what prevents the compiler from moving it to before the acquire-CAS of the lock flag? I thought that compilers were allowed to re-order writes to different memory locations when generating assembly code. Is there some special rule that prevents a write from being re-ordered to before an atomic store that precedes it in program order?

I'm looking for a language lawyer answer here, preferably covering std::atomic as well as std::atomic_flag.

Edit to include something from the comments which maybe asks the question more clearly. The heart of the question is: which part of the standard says that the abstract machine must observe l_ being false before it writes to g_some_int_?

I suspect the answer is either "writes can't be lifted above potentially infinite loops" or "writes can't be lifted above atomic writes". Perhaps it's even "you're wrong that writes can ever be re-ordered at all". But I'm looking for a specific reference in the standard.

The `memory_order_acquire` on the lock and `memory_order_release` on the unlock accomplish this. See `memory_order`. — Raymond Chen, Jul 23 '14 at 04:55
I don't see why the compiler wouldn't be allowed to move the assignment after `Unlock`, either. In your example, there's no way for a valid, race-free program to ever observe the value of `g_some_int_` anyway, unless there's some additional, external synchronization. — Igor Tandetnik, Jul 23 '14 at 04:56
@RaymondChen: Please read my question again, grepping for "which is guaranteed by the release/acquire ordering". I can see why what you say is true for one reordering, but not for the other. — jacobsa, Jul 23 '14 at 04:56
@IgorTandetnik: You're right at the C++ level, but I'm asking about the location of the store in the generated assembly code. There's not the same notion of races there, and re-ordering in that way would cause incorrect behavior according to the release/acquire ordering model. I just don't see where the memory model guarantees the same thing for the `Lock` call. — jacobsa, Jul 23 '14 at 04:58
The memory model is a C++-level notion. It doesn't say anything about "generated assembly code", only about observable behavior of a program. The compiler is free to generate any assembly that leads to the observable behavior of the resulting program matching one of the possible executions of the abstract machine specified by the standard. For example, a compiler capable of whole program optimization could notice that you never ever read `g_some_init_`, and optimize away the whole `BlindWrite` function to a no-op. — Igor Tandetnik, Jul 23 '14 at 05:00
@IgorTandetnik: I completely agree; that is the background for my question. The question itself is: which part of the standard says that the abstract machine must observe `l_` being unset before it writes `g_some_int_`? I suspect the answer is either "writes can't be lifted above potentially infinite loops" or "writes can't be lifted above atomic writes". But I'm looking for a specific reference in the standard. — jacobsa, Jul 23 '14 at 05:16
[intro.multithread.10] shows that the release in `Unlock` is dependency-ordered before the acquire in `Lock`. [intro.execution.14] shows that the `Lock` is sequenced before the load of `g_some_int_`, and that the the store to `g_some_int_` is sequenced before the `Unlock`. [atomics.order.2] shows that the release in `Unlock` is synchronized with the acquire in `Lock`. Therefore [intro.multithread.11] the write to `g_some_int_` inter-thread-happens before the read of `g_some_int_`. — Raymond Chen, Jul 23 '14 at 06:21
@RaymondChen: Thanks, but there was no read of `g_some_int_` in my question. :-) I was only asking about hoisting writes. The answer from Anthony covers what I was looking for. — jacobsa, Jul 23 '14 at 10:17
`which part of the standard says that the abstract machine must observe l_ being unset before it writes g_some_int_?` **1.9/14** Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated. **6/1** Except as indicated, statements are executed in sequence. — Igor Tandetnik, Jul 23 '14 at 12:49
If there are no reads from `g_some_int_`, then the reordering is not observable, so the issue is moot. The compiler could optimize out the `g_some_int_` variable entirely. I'm assuming that somewhere else in the code there is a protected read of `g_some_int_`. — Raymond Chen, Jul 23 '14 at 14:43

score 4 · Accepted Answer · answered Jul 23 '14 at 08:33

4

Suppose you have two functions that use your spinlock:

SpinLock sl;
int global_int=0;

int read(){
    sl.Lock();
    int res=global_int;
    sl.Unlock();
    return res;
}

void write(int val){
    sl.Lock();
    global_int=val;
    sl.Unlock();
}

If two calls to BlindWrite happen concurrently on separate threads, then one (call it A) will acquire the lock; the other (B) will spin in the loop in Lock.

A then writes to g_some_int, and calls Unlock, which contains a call to clear which is a store-release. The write is sequenced-before the call to clear since it is in the same thread.

B then wakes in Lock, and this time the test_and_set call returns false. This is a load-acquire that read the value stored by the call to clear, so the call to clear synchronizes-with this call to test_and_set.

The test_and_set call in Lock is a load-acquire, and is sequenced-before the write to g_some_int in BlindWrite, since it is in the same thread.

Since the first write in thread A is sequenced-before the call to clear, which synchronizes-with the call to test_and_set in thread B, which is in turn sequenced-before the write in thread B, the write in thread A happens-before the write in thread B.

If the compiler hoisted the write to g_some_int above the call to Lock, then it would be possible for the write from thread B to occur before the write in thread A. This would violate the happens-before ordering, so is not permitted.

In general, this means that the compiler cannot hoist anything above a load-acquire, since it may violate the happens-before ordering.

answered Jul 23 '14 at 08:33

Anthony Williams

66,628
14
133
155

This is how I understood it, correct me if I'm wrong: _Sequenced-before_ basically applies to all operations executing on the same thread. The reason the compiler is allowed to reorder those instructions in the first place is that he can violate _sequenced-before_ ordering as long as he's obeying the _as-if rule_. In a single-threaded context that simply means not reordering a read before a write to the same data. In a multithreaded context, it also means taking into account _synchronizes-with_ constraints from atomics and locks. – ComicSansMS Jul 23 '14 at 09:15
This is pretty much what I was looking for. I can't find a sentence in the standard that specifically says the abstract machine has to obey the happens before relationship in its side effects, but it seems pretty natural to assume that is implied. Thank you! – jacobsa Jul 23 '14 at 10:15
**1.9/13** Given any two evaluations `A` and `B`, if `A` is sequenced before `B`, then the execution of `A` shall precede the execution of `B`. – Igor Tandetnik Jul 23 '14 at 12:54
The *happens-before* ordering is covered in 1.9 and 1.10, and how it relates to atomics is in 29.3. It is quite hard to follow, but it is all there. The *abstract machine* has to follow the requirements, because the requirements for the abstract machine are what the standard specifies! – Anthony Williams Jul 24 '14 at 08:21
@IgorTandetnik: Thanks, that plus the quote you gave in the other comment thread ("Except as indicated, statements are executed in sequence") is exactly what I was looking for. Anthony, I agree it's natural that it obeys happens-before. :-) I was just wondering if it was stated explicitly somewhere. – jacobsa Jul 25 '14 at 04:45
@AnthonyWilliams: I've asked a related question [here](https://stackoverflow.com/q/45475241/1505451) if you're interested in having a look... – jacobsa Aug 03 '17 at 06:08
@AnthonyWilliams What are the elementary operations of the evaluation of a program where the abstract machine requirements have to be followed? – curiousguy Dec 15 '19 at 07:38
@curiousguy Abstract machine operations are I/O, and access to `volatile` objects: https://eel.is/c++draft/intro.abstract#6 – Anthony Williams Dec 16 '19 at 08:30
@AnthonyWilliams I don't understand how that "abstract machine" is supposed to do computations. – curiousguy Dec 16 '19 at 18:41
@curiousguy by magic, or any other mechanism that produces the correct sequence of I/O and `volatile` accesses. – Anthony Williams Dec 17 '19 at 08:37
@AnthonyWilliams OK but that doesn't say what the "correct" sequence is! "_The abstract machine has to follow the requirements_" It isn't clear what is required for an MT program (unlike non threaded) – curiousguy Dec 17 '19 at 08:43
There isn't a single correct sequence, except in very special cases. A possible correct sequence is anything that doesn't violate the constraints in https://eel.is/c++draft/basic.exec – Anthony Williams Dec 17 '19 at 08:48

C++11: What prevents stores from lifting past the start of a lock's critical section?

1 Answers1

Linked