Why not use std::memory_order_acq_rel

Question

I am learning the code of cppcoro project recently. And i have a question.

https://github.com/lewissbaker/cppcoro/blob/master/lib/async_auto_reset_event.cpp#L218 https://github.com/lewissbaker/cppcoro/blob/master/lib/async_auto_reset_event.cpp#L284

if (waiter->m_refCount.fetch_sub(1, std::memory_order_release) == 1) // #218
{
    waiter->m_awaiter.resume();
}

using memory_order_release writting in line 218, m_refCount using memory_order_acquire flag can load the value correctly in line 284. That is OK. But fetch_sub is a RMW opertion. To read the modification in line 284 correctly, is there also need a memory_order_aquire flag? So i wonder why don't m_refCount use memory_order_acq_rel in line 218 and line 284?

return m_refCount.fetch_sub(1, std::memory_order_acquire) != 1; // #284

Thank you.

memory order doesn't change the atomicity of the object in question, they only relax the happens-before rules for *other* objects. — Caleth, Dec 07 '20 at 13:20

David Haim · Accepted Answer · 2020-12-07T17:25:18.997

1

Because this is not how memory orders work.

We add a memory barrier to our atomic operation in order to achieve two things:

to prevent relaxed atomic operations and non-atomic operations from being reordered.
to synchronize non atomic data across threads

I wrote an answer here that explains these two points more clearly.

When a coroutine is suspended in one thread, and resumed in another thread, no additional synchronization is needed*, cppreference on coroutines says:

Note that because the coroutine is fully suspended before entering awaiter.await_suspend(), that function is free to transfer the coroutine handle across threads, with no additional synchronization.

As for reordering? the actual logic ( waiter->m_awaiter.resume();) is surrounded by one big fat if statement. the compiler anyway can not reorder the resumption before the fetch_sub, because then it ignores the role of the if-statement and breaks the logic of the code.

So, we don't need any other memory order here but relaxed. The fact that fetch_XXX is a RMW operation means nothing - we use the right memory order for the right use-case.

If you like cppcoro, please try my own coroutine library, concurrencpp.

*A more correct statement is: no additional synchronization is needed besides the synchronization needed in order to pass a coroutine_handle from one thread to another.

edited Dec 07 '20 at 17:25

answered Dec 07 '20 at 17:19

David Haim

25,446
3
44
78

I understand after reading this text that you writting in above link:"Using an atomic variable solves the problem - by using atomics all threads are guarantees to read the latest writen-value even if the memory order is relaxed." I have some misunderstandings about memory_order_relaxed flag. I used to think that atomic variables used memory_order_relaxed flag. The write operation of A thread is not guaranteed to be discovered in time by reading in B thread. – breaker00 Dec 08 '20 at 02:25
Read the linked answer again, this is a common misconception. Again, atomic variables are always thread safe. MO is used to synchronize non atomic data and prevent reordering. the atomic variable doesn't need to have MO to be thread safe. – David Haim Dec 08 '20 at 09:30
emm.. But simply talking about thread safety does not solve my previous doubts. I understand that the thread safety provided by atomic variable is that it will not read an intermediate value that is changing. That value update in thread A cannot be discovered in time by thread B so that read an old value because of cache or other similar reasons I don’t think a thread safety problem, but a memory visibility problem. So I used to think that the visibility that is effective immediately was provided by some MO, but I didn't know it was provided by atomic variable self. – breaker00 Dec 08 '20 at 09:58

score 0 · Answer 2 · answered Dec 07 '20 at 13:25

0

The operations on the atomic variable itself will not cause data races either way.

std::memory_order_release means that all modified cached data will be committed to shared memory / RAM. Memory order operations generate memory fences so other objects could be properly committed to / read from shared memory.

answered Dec 07 '20 at 13:25

ALX23z

4,456
1
11
18

You can improve by talking about cache coherency. – Surt Dec 07 '20 at 13:29
What do you mean by "RAM"? The physical RAM modules? – curiousguy Dec 21 '20 at 00:45

Why not use std::memory_order_acq_rel

2 Answers2