Which spinlock method is more efficient: retry test_and_set(), or spin read-only on test()?

Question

Which spinlock method is better (in terms of efficiency)?

#include <atomic>

#define METHOD 1


int main( )
{
    std::atomic_flag lock { };

#if METHOD == 1
    while ( lock.test_and_set( std::memory_order_acquire ) )
    {
        while ( lock.test( std::memory_order_relaxed ) );
    }
#else
    while ( lock.test_and_set( std::memory_order_acquire ) );
#endif

    lock.clear( std::memory_order_release );
}

This example comes from cppreference. What happens when we add/remove the call to test(std::memory_order_relaxed) inside the outer loop?

I see a noticeable difference in generated code between the two methods (here).

@Peter Cordes What if only one thread is waiting by doing a spinlock? Does it make a difference between the two above methods? — digito_evo, Apr 14 '23 at 07:37
I turned my comments into an answer which mentions advantage for spinning read-only even if there are only 2 threads total (one holding the lock, the other spin-waiting.) — Peter Cordes, Apr 14 '23 at 07:40

score 5 · Accepted Answer · answered Apr 14 '23 at 07:39

5

Generally the version that spins read-only on .test() is best, instead of stealing ownership of the cache line from the thread that's trying to unlock it. Especially if the spinlock is in the same cache line as any other data, like data the lock owner might be just reading, you're creating even more and worse false-sharing this way.

Also, if multiple threads are spin-waiting on the same spinlock, you don't want them wasting bandwidth on the interconnect between cores ping-ponging the cache line containing the lock. (If multiple threads spinning happens at all often, a pure spinlock is usually a bad choice. Normally you'd want to eventually yield the CPU to another thread via OS-assisted sleep/wake, e.g. via futex. C++20 .wait() and .notify_one() can do this, or just use a good implementation of std::mutex or std::shared_mutex.).

See for more details:

Unfortunately C++ lacks a portable function like Rust's core::hint::spin_loop which will compile to a pause instruction on x86, or equivalent on other ISAs.

So a read-only loop will waste more execution resources on a CPU with hyperthreading (stealing them from the other logical core), but waste fewer store-buffer entries and less off-core traffic if anything else is even reading the line. Especially if you have multiple threads spin-waiting on the same lock, ping-ponging the cache line!

If you don't mind a #ifdef __amd64__ / #include <immintrin.h> for _mm_pause(), then you can have that advantage, too.

answered Apr 14 '23 at 07:39

Peter Cordes

328,167
45
605
847

Thanks for the details. I'm trying to make it run on X86-64 but I also want to keep it portable. I have tried `wait`/`notify_one` and they result in much faster code for my program. However, those are good for passing notifications. Some other parts of the program have contentions so they compete over a lock. That's where I need to use spinlocks as above. I could use `acquire`/`release` of `std::binary_semaphore` but they're not `noexcept` so they can cause issues at runtime in case they through inside a catch block trying to release a semaphore. – digito_evo Apr 14 '23 at 07:52
1

@digito_evo: The idea with `wait()` as a fallback for lock contention is that if you spin a few dozen or a few hundred times without getting the lock, you assume that the lock holder is asleep or doing something really slow, and you call `.wait()` on the lock. With clever design you can hopefully figure out how to have the `unlock` function avoid calling `.notify_one()` where there are definitely no waiters, e.g. having a spinning thread increment an `std::atomic` or `std::atomic` which you use instead of `std::atomic_flag`. Or just use glibc mutex which does this already. – Peter Cordes Apr 14 '23 at 07:58
Also is writing a spinlock in different places a good thing? I feel like it's not ok to put one of those loops in whatever block of code needs it. It leads to duplication (and violates DRY). Is it possible to prevent this? Like maybe via an `inline` function that could be called instead of manually writing a spin loop everywhere. Could you show me a proper solution for this? – digito_evo Apr 15 '23 at 09:53
1

@digito_evo: Yeah, you write a function like `inline void lock(std::atomic_flag *)` and/or `inline bool try_lock(std::atomic_flag *)` and call it. Put them in a `.h` where they can inline, of course. Also an `unlock` function that does a `memory_order_release` store. Or better, make an RAII class that works like `std::lock_guard` (https://en.cppreference.com/w/cpp/thread/lock_guard) that you can use on your spinlock the same way as `lock_guard` works on `std::mutex` – Peter Cordes Apr 15 '23 at 15:45

Which spinlock method is more efficient: retry test_and_set(), or spin read-only on test()?

1 Answers1