Can compile-time memory reordering lead to deadlocks?

Question

While watching this talk about the implementation of C++11 atomics in LLVM there was this piece of code

-- Initially --
int x = 0;
std::atomic<bool> flag1{false}, flag2{false};


-- Thread 1 --
x = 42;
flag1.store(true, std::memory_order_release);

while(!flag2.load(std::memory_order_acquire));
x = 43;


-- Thread 2 --
while(!flag1.load(std::memory_order_acquire));
printf("%d", x);
flag2.store(true, std::memory_order_release);

I consider this code data-race free (as is also stated by the speaker): it will never print anything but 42.

However, I am not sure that it will ever print 42. My question is: Wouldn't a compiler be allowed to reorder the store past the while-loop in Thread 1 so that both threads would deadlock? Or what part of the C++11 standard prevents such kind of behavior?

@ParkYoung-Bae: The question is, could both the store and the release fence pass through following loop (loading an unrelated variable with an acquire fence)? — Mike Seymour, Dec 13 '14 at 11:25
@ParkYoung-Bae Also, if I understand correctly, the release-fence would have to be put *before* the store (if one would be willing to implement a store-release in terms of a release fence). — levzettelin, Dec 13 '14 at 11:30

score 0 · Answer 1 · answered Dec 13 '14 at 12:01

0

The compiler must NEVER move a store for any (externally visible) value past a release fence, and must not move reads above a acquire fence. This is the main purpose of fences.

There may be other semantics involved here too, for example if caches need to be flushed, a release fence will flush any "writes" pending from this CPU out to main memory. Similarly, an acquire fence will need to flush all or selected regions so that new values are read in before the next read is issued.

However, all modern CPUs have coherent memory between CPUs, so this is not an issue - it may be an issue in some unusual/small or old CPUs that has a cache that assumes other CPUs won't read the same memory as they have in the cache. If you have working with non-uniform processors that aren't coherent, then cache maintenance becomes an issue too - you will need to make sure caches are flushed in the correct ways. Again, somewhat of a specialized area, but can be an important factor in multiprocessor systems.

answered Dec 13 '14 at 12:01

Mats Petersson

126,704
14
140
227

If I understand correctly, release-fences in C++11 only prevent the reordering of loads and stores past any subsequent stores. Thus, if the release fence is immediately followed by a load, the compiler would still be allowed to reorder previous stores past that load. Also, you are not answering the question. – levzettelin Dec 13 '14 at 12:34
That's why the `while(!flag2.load(std::memory_order_acquire));` needs to be exactly an acquire fence, not (for example) a release fence. This requires that any preceding stores have been completed, before the load is performed. I will try to expand my answer in a bit, I'm in the middle of something else, and I can see this taking some time to explain clearly. – Mats Petersson Dec 13 '14 at 12:43
The code in the OP does not contain any fences (at least in the C++11 sense). These are simply atomics which may be implemented via fences (cf. http://preshing.com/20130922/acquire-and-release-fences/). But other implementations are possible and even common. – levzettelin Dec 13 '14 at 12:57
But the whole point of supplying "memory ordering" such as `std::memory_order_acquire` is that, from the language and compilers perspective, they are indeed memory order barriers or fences. If that wasn't the case, these type of operations could never be predictable, because anything could be moved across anything - and thus we could never implement a reliable semaphore, spinlock or other locking mechanism. – Mats Petersson Dec 13 '14 at 13:04
@ParkYoung-Bae One can distinguish two types of memory fences: a) the ones in assembly and b) the [`atomic_thread_fence`](http://en.cppreference.com/w/cpp/atomic/atomic_thread_fence) in C++11. As described in the blog-article cited above, one can implement an atomic-store-release via an atomic-store-relaxed with a preceeding `atomic_thread_fence(release)`. That was what I wanted to say (in admittedly clumsy words). Anyway, I suggest that in a C++ context one should not refer to an atomic-store-release by the name of *release-fence*, as this answer does in the very first sentence. – levzettelin Dec 13 '14 at 18:10
I can't actually make heads or tail out of the C++ 11 spec wording. I know what I describe in the text above is "why it works", but I can't find anything in the spec that clearly states that (in Swedish, we'd say "walks around it like a cat around hot porridge" - it never really gets to the core of the matter and explains it, just says lots of "If X then Y - but not if Z". It makes it very clear that `std::memory_order_relaxed` is allowed to reorder). Since removing my answer would lead to this comment discussion disappearing, I don't think that's fair, but I probably should... – Mats Petersson Dec 14 '14 at 09:40

Can compile-time memory reordering lead to deadlocks?

1 Answers1