C++ atomics memory ordering for some specific use case

Question

I'm in the next situation where I use a atomic<uint64_t> as a counter and increment it from 5 or more threads and use the value before increment to take some decision.

atomic<uint64_t> global_counter;

void thread_funtion(){
    uint64_t local_counter = global_counter.fetch_add(1,std::memory_order_relaxed);
    if(local_counter == 24)
       do_somthing(local_counter);
}

thread_funtion() will be executed by 5 different threads. Once I got the local_counter my code doesn't care anymore if the global_counter changes again while thread_funtion() is running (the business logic is in such a way that I only need have a unique incrementing value per thread_function() call).

Is std::memory_order_relaxed safe to be used in this case ?

alain · Accepted Answer · 2019-12-12T14:22:27.653

7

atomic<...>::fetch_add(..., std::memory_order_relaxed) guarantees the atomic execution, but nothing more.

But even with memory_order_relaxed, there will be one, and only one thread calling do_something(). Since this fetch_add is the only operation on global_counter, and it is executed atomically, the value 24 must be reached exactly once. But there is no guarantee which thread it will be.

edited Dec 12 '19 at 14:22

answered Nov 19 '19 at 14:03

alain

11,939
2
31
51

there is no mutex or other syncronization primitive in the code above that will make sure do_something() will be called by one and only one thread. Can you please elaborate a bit ? – user1934513 Nov 19 '19 at 14:14
@user1934513 Only one call of fetch_add can return 24. – dohashi Nov 19 '19 at 14:25
1

@alain I may be wrong but isn't fetch_add composed of load write store operations which are pipelined and executed out of order on a particular CPU ? – user1934513 Nov 19 '19 at 14:42
1

It's also worth noting that this situation would generally recommend [`memory_order_acquire`](https://en.cppreference.com/w/cpp/atomic/memory_order) to prevent loads from migrating prior to the fetch. As with relaxed that is technically possible which could cause odd behaviors. That said without knowing if there is other code around this it's hard to say for sure. – Mgetz Nov 19 '19 at 14:43
@user1934513 no, `fetch_add` is executed atomically as one atomic operation. It's not the same as if you implemented a `fetch_add` yourself with individual atomic operations which could be executed out of order. – alain Nov 19 '19 at 14:49
totally agree but the instruction it's self is decomposed because the CPU needs first to do a load operation, then a write operation and finally a store operation. – user1934513 Nov 19 '19 at 15:07
I think this is what @Mgetz wants to suggest – user1934513 Nov 19 '19 at 15:24
@user1934513 Reading the answer to this question might be helpful: https://stackoverflow.com/questions/38447226/atomicity-on-x86. The summary is that hardware allows for the load and write to happen atomically. – dohashi Nov 19 '19 at 15:26
The atomicity is guaranteed for the entire operation, and I agree that the implementers of the library must take care to do it properly. I understood Mgetz's comment to mean that if you have other operations in your real code, another memory order might be needed, and I fully agree, but the code in your question works fine with `memory_order_relaxed`. – alain Nov 19 '19 at 15:29
2

@user1934513, the C++ standard says it all: `Atomically replaces the current value with the result of arithmetic addition of the value and arg.` At best, it can be done in a single atomic instruction (i.e. `xadd`) or atomicity can be achieved using locks, CAS loops, etc. – Eric Nov 19 '19 at 15:56
1

@user1934513: On most RISCS it's not even a single instruction: most other ISAs use [LL/SC](https://en.wikipedia.org/wiki/Load-link/store-conditional). But the store only succeeds if this core held exclusive ownership of the cache line since the load, therefore the load and store are stuck together into a single atomic RMW as far as the global order of operations is concerned. For x86, `lock add [mem], 1` does the same thing by making the core hold onto the cache line for the duration. [Can num++ be atomic for 'int num'?](//stackoverflow.com/q/39393850) – Peter Cordes Nov 22 '19 at 03:16
1

@user1934513: TL:DR: unlike a load or store, an RMW can't literally happen instantaneously, but it only has to be atomic wrt. any possible observer (other cores or DMA devices) that respect the MESI protocol, not an omniscient outside observer that can see internal state in a CPU core. – Peter Cordes Nov 22 '19 at 03:17
1

Also, memory ordering / barriers don't make something atomic. The barriers and stuff that seq_cst requires are to order it wrt. *other* operations, not to make it atomic. – Peter Cordes Nov 22 '19 at 03:19
Thanks for your comment, @PeterCordes, yes, as it was written, it was indeed misleading (atomic <-> memory order), hope it's better now. – alain Dec 12 '19 at 14:09
I was replying to comments. Your answer was fine before your edit. In fact you arguably introduced an error: *The operation may be reordered relative to other operations on global_counter.* The phrasing *re*ordered would appear to mean breaking or removing some ordering requirement that you'd get, e.g. from program order. But with multiple threads all independently trying to increment a single counter with no synchronization, there is no ordering even with `seq_cst`. There's nothing to reorder, only a case of letting it run and find out what order your happen to get. relaxed has no effect – Peter Cordes Dec 12 '19 at 14:18
Anyway, the modification order that exists separately for every atomic object *does* respect program-order within a single thread. So in that context, or any other case of a synchronizes-with that does define some inter-thread ordering, multiple operations on `global_counter` *can't* reorder with each other even with mo_relaxed. e.g. `global_counter.store(1, relaxed);` `global_counter.store(2, relaxed);` is guaranteed to leave the long-term value as `2` eventually, once both stores become visible to all other threads. And that if the the `1` store is visible at all, it's before the `2`. – Peter Cordes Dec 12 '19 at 14:22
Ok, yeah that last edit removed the problems. And the more explicit 2nd paragraph is probably an improvement for readers to whom it wasn't already obvious how this worked. – Peter Cordes Dec 12 '19 at 14:23
Ok, thanks again, I removed that now... But my first version said "..relaxed guarantees the atomic execution", which was IMO misleading, because, as you said, it's not the memory order that guarantees the atomic execution. – alain Dec 12 '19 at 14:25

C++ atomics memory ordering for some specific use case

1 Answers1