Is there a risk here of code reordering and the lock failing?

Question

I have the following lock that I use to prevent multiple threads from accessing the same code:

struct SimpleAtomicThreadLock
{
constexpr SimpleAtomicThreadLock() : bIsLocked(false) {}
std::atomic<bool> bIsLocked;

void lock()
{
    while (bIsLocked.exchange(true));
}
void unlock()
{
    bIsLocked.store(false);
}
};

And I use it like this:

void Mesh::drawMesh()
{
    static constexpr SimpleAtomicThreadLock lock;

    lock.lock();
        /* BOTH MESH AND MATERIAL NEED TO BE MADE SHADER ACCESS READY */
        if (!mesh->isDeviceLocal() && !mesh->bIsBeingMadeDeviceLocal)
        {
            Engine::makeMeshDeviceLocal(mesh);
        }
        if (!this->material->isReadyForShaderUse() && !this->material->bIsBeingMadeReadyForShaderUse)
        {
            Engine::makeMaterialShaderAccessReady(material);
        }

        lock.unlock();
}

I've been reading about the c++ memory model with respect to atomics, and found out that (ordinarily) the compiler can reorder statements in a function. In this case is it possible that the calls in between lock and unlock can be made even before lock?

`it possible that the calls in between lock and unlock can be made even before lock?` No. You should use a lock guard though. — tkausl, Feb 26 '23 at 16:10
By default, atomics are not ordered and act as a full memory barrier because of the default parameter forcing the sequential consistency. Besides, using spin lock is generally a bad idea. It seems better locally but can cause huge issue at the application/platform level like wasting cores, causing heat issue, throttling, decreasing performance, and causing random non-determinist freeze on some platforms (eg. game console). Not to mention this spin-lock can cause contention issues regarding the target platform. If you are writing a game, I strongly advise you to think twice before using them. — Jérôme Richard, Feb 26 '23 at 16:29
@JérômeRichard The alternative to using a spin lock is to use a mutex, right? Which needs to make a system call for the mutex lock most likely, which is slower, no? — Zebrafish, Feb 26 '23 at 16:32
A spin lock can be far slower on gaming console (eg. PS4/PS5/XBox) if cores are saturated. Indeed, the OS is not aware of the spinlock and will let a core being used to continuously try to lock a mutex while it can be used for a useful work. On such platforms, it can even cause a priority inversion causing the whole game to freeze if the threads using the lock have different priorities (quite frequent in games). On PC, you should see no freeze but a spin-lock thread can be scheduled during a quantum (8-10 ms) while a context switch is about few (dozens) µs. — Jérôme Richard, Feb 26 '23 at 16:39
You first need to be sure the context-switch and system calls are actually a problem on the target platform for your specific case (pretty dependent of the actual platform). A lock is very cheap when it is never wait for an unlock. If so, you can use an adaptative lock doing a spinlock for a given time and then a system lock. Besides, writing a good spinlock is not so easy as it seems due to contention issues. The benefit of system locks is that they do not stab you in the back (sometimes few months later for no obvious reasons). — Jérôme Richard, Feb 26 '23 at 16:46
For more information, please read [When should one use a spinlock instead of mutex?](https://stackoverflow.com/questions/5869825/when-should-one-use-a-spinlock-instead-of-mutex) which describe most of my points more precisely. — Jérôme Richard, Feb 26 '23 at 16:56
Modern mutex implementations are often "futex", which require no system call when uncontended. And if there is contention for the lock, then you're going to be blocking anyway, so the cost of the system call in that case is acceptable. — Nate Eldredge, Feb 26 '23 at 18:09

Chris Beck · Answer 1 · 2023-02-26T19:09:10.573

No, it is not possible in this case.

The reason is that the atomic functions you are calling, std::atomic<bool>::exchange and std::atomic<bool>::store, both take a std::memory_ordering as a second parameter, and the default value is std::memory_ordering_seq_cst.

https://en.cppreference.com/w/cpp/atomic/atomic/exchange https://en.cppreference.com/w/cpp/atomic/atomic/store

For info about what this means, you can see: https://en.cppreference.com/w/cpp/atomic/memory_order

Seq Cst (sequential consistency) is the most conservative memory ordering, and is the most commonly recommended one to use, which is why it is the default.

The way the standard is written, inter-thread synchronization is defined in terms of a number of formal properties between different statements in your program, called "synchronizes with", "happens before", and "carries dependency". Informally, as long as other threads lock and unlock your lock appropriately, any side-effects that happen in the critical region "happen before" unlocking, and locking "happens before" those changes. And whenever someone locks or unlocks the lock, that has a "synchronizes with" relationship with other locking operations.

(Usually, the standard doesn't talk about what the compiler is allowed to reorder, because that's too low-level, and they don't like to mandate implementation details. Instead they try to define what counts as "observable behavior" and what is undefined behavior, and the compiler is allowed to make any optimizations that don't change the "observable behavior" of programs that do not have undefined behavior.)

The upshot is that the compiler is not allowed to move operations with memory side-effects across your locking and unlocking statements, because they would be changing the observable behavior of your program, as defined by the C++ standard, if they do that.

Note that the fact that SeqCst is being used here is critical -- if you were using "relaxed" atomics, then this would not be the case, and the compiler would be allowed to move the statements outside the critical section, and you would have a bug.

If you would like to understand the memory model ideas in more detail, I highly recommend Herb Sutters' Atomic Weapon's talk: https://www.youtube.com/watch?v=A8eCGOqgvH4

That talk goes into considerable detail with examples about what the memory model means and how it connects not only to the compiler, but to modern hardware architectures, and it also gives a lot of useful and practical advice about how to think about this and how to use it.

One of the main takeaways (IMO) is you should almost always just use the default of Sequential Consistency, and your programs and locks will generally just work the way you think they should as long you don't create races. You generally have to be doing something very low level where performance on the level of nanoseconds is critical before it starts to become worth it to start using a weaker memory ordering. It may also make sense to customize the std::memory_order parameter if you are working on the standard library or something like this, where your lock will be used by an enormous number of other projects with a very wide range of needs. Outside of these cases, the performance benefits you will get from using something more complicated than SeqCst are usually not enough to justify the increased complexity of understanding and maintaining the code.

Is there a risk here of code reordering and the lock failing?

1 Answers1