5

I am trying to get familiar with the new memory ordering concepts of c++11 and believed I actully had a quite good grasp on them, until I stumbled upon this implementation of a spin lock:

#include <atomic>

namespace JayZ
{
    namespace Tools
    {
        class SpinLock
        {
        private:
            std::atomic_flag spin_lock;
        public:
            inline SpinLock( void ) : atomic_flag( ATOMIC_FLAG_INIT ) {}

            inline void lock( void )
            {
                while( spin_lock.test_and_set( std::memory_order_acquire ) )
                    ;
            }

            inline void unlock( void )
            {
                lock.clear( std::memory_order_release );
            }
        };
    }
}

It is e.g. equivalently mentioned at http://en.cppreference.com/w/cpp/atomic/atomic_flag
and also in the book "Concurrency in Action". I also found it someplace here at SO.

But I just don't understand why it would work!
Imagine thread 1 calls lock() and test_and_set() returns 0 as the old value --> thread 1 has acquired the lock.
But then thread 2 comes along and tries the same. Now since there has occurred no "store synchronization" (release,seq_cst_acq_rel) thread 1's store to spin_lock should be of type relaxed.
But from this follows that it cannot imao be synchronized with thread 2's read of spin_lock. This should make it possible for thread 2 to read the value 0 from spin_lock and thus acquire the lock as well.
Where is my mistake?

iolo
  • 1,090
  • 1
  • 9
  • 20

3 Answers3

6

Your mistake is in forgetting that spin_lock is an atomic_flag and thus test_and_set is an atomic operation. The memory_order_acquire and memory_order_release is needed to prevent reads from migrating to before the lock operation or writes from migrating to after the unlock. The lock itself is protected by atomicity which always includes visibility.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • 2
    But if what you say is true. Wouldn't that mean, that any relaxed store to an atomic variable would immediately be visible to another thread? And that is definitely not the case, is it? – iolo Feb 09 '13 at 21:03
  • 1
    Any operation on an atomic variable is atomic and immediately visible to any other thread that performs an atomic operation on that same atomic variable. That's what atomic variables do. – David Schwartz Feb 09 '13 at 21:06
  • Ok. So to put it in c++ terms, the atomics behave a little as volatile variables in the sense that the load is never optimized away through caching etc? – iolo Feb 09 '13 at 21:08
  • 8
    You could say that atomic variables actually do what a lot of people have mistakenly thought that volatile variables do. – David Schwartz Feb 09 '13 at 21:10
  • Well probably... They are atomic - volatiles are not! But am I right to assume that as you implied, these atomic variables won't be optimized into a register or cache? – iolo Feb 09 '13 at 21:16
  • There's no way to answer that question because it would require an intimate knowledge of every possible platform C++ code might run on. If the compiler and platform permits them to be optimized into registers or cache while preserving their semantics, then they can be. The C++ standard doesn't care *how* they're implemented, just *what* they do. (And on modern x86 platforms, atomics are aggressively cached. Otherwise performance would be terrible.) – David Schwartz Feb 09 '13 at 21:22
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/24240/discussion-between-iolo-and-david-schwartz) – iolo Feb 09 '13 at 21:23
  • I'm pretty sure this answer is false. For example: `atomic a{0}, b{0}; void t1() { a.store(1,relaxed); b.store(1,relaxed); } void t2() { int _b = b.load(relaxed); int _a = a.load(relaxed); cout << _a << ' ' << _b; }` I don't see any requirement of the C++ memory model that would prohibit this code (suitably filled out) from outputting "0 1". But that output clearly means that even though the store to `a` happened it's not visible when `a` is loaded. @iolo – bames53 Feb 09 '13 at 23:32
  • 1
    @bames53: That's not a visibility issue, it's an ordering issue. Nothing requires those operations to actually take place in the order the code specifies them. Nevertheless, they're atomic when performed and visible to other atomic operations *after* having taken place. (As the answer explains, that's why you need to specify a memory order if you need it to interact in specific ways with other accesses.) – David Schwartz Feb 09 '13 at 23:43
  • Okay, then take the code: `atomic a{0}, b{0}; void t1() { a.store(1,relaxed); b.store(1,relaxed); } void t2() { int _b = b.load(relaxed); int _a = a.load(relaxed); cout << " T2: " << _a << ' ' << _b; } void t3() { int _b = b.load(relaxed); int _a = a.load(relaxed); cout << " T3: " << _a << ' ' << _b; }`. This can legally output: " T2: 0 1 T3: 1 0", demonstrating that whichever order the stores take place then from one of those threads one of those stores is not visible. – bames53 Feb 09 '13 at 23:56
  • 2
    @bames53: There is no such thing as "whichever order the stores take place". These stores are *unordered*. That doesn't mean they have to take place in one of two possible orders, it means they don't take place in an order. (You are making the invalid assumption that there exists some global notion of time and that every load or store takes place at some defined point in this global timeline. That model of time is not required.) Operations on distinct atomic variables are unordered unless you specifically compel an ordering. – David Schwartz Feb 10 '13 at 02:02
  • 1
    Okay, my claim that an atomic store does not necessarily become visible 'immediately' to all threads and thus stores ordered one way (i.e., with a happens-before relationship) on one thread may become visible on another in a different order is equivalent to your claim that stores always become visible immediately but that the 'same' points in time may be ordered differently on another thread. I'm dubious of the value of this concept of 'same time' though, and the standard actually says "Implementations should make atomic stores visible to atomic loads within a reasonable amount of time." – bames53 Feb 10 '13 at 04:56
  • Also I'm still interested in your response to Howard Hinnant [here](http://stackoverflow.com/a/13667895/365496). – bames53 Feb 10 '13 at 05:01
3

For a given atomic variable, there is a "modification order" for it. Once thread 1 test_and_sets the value from 0 to 1, it is impossible for thread 2 to see a 0.

Memory order affects how all other memory addresses are 'synced.' If one thread modifies an atomic variable with a memory-order_release, then any thread that reads the same variable with memory_order_acquire "sees" every memory change the first thread made before it released.

The acquire and release is not about the atomic. It's about making sure every thread that successfully locks the spinlock "sees" the changes of every thread that locked it before.

The modification order is the key to making the algorithm lockfree. Both thread 1 and thread 2 are trying to do a test_and_set on the same variable, so by the rules, one modification "happens before" the other. Because the test_and_set that "happens before" the other gets to "progress," at least one thread must always make progress. This is the definition of lockfree

Cort Ammon
  • 10,221
  • 31
  • 45
2

test_and_set operations on atomic flags are specified to be read-modify-write operations which have special characteristics, one of which is:

Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation. [n3337 § 29.3/12]

This is also why fetch_add, for example, works whereas simple load operations are not required to read the latest value in the modification order.

bames53
  • 86,085
  • 15
  • 179
  • 244