I have std::atomic<T> atomic_value;
(for type T being bool, int32_t, int64_t and any other). If 1st thread does
atomic_value.store(value, std::memory_order_relaxed);
and in 2nd thread at some points of code I do
auto value = atomic_value.load(std::memory_order_relaxed);
How fast is this updated atomic value propagated from 1st thread to 2nd, between CPU cores? (for all CPU models)
Is it propagated almost-immediately? For example up-to speed of cache coherence propagation in Intel, meaning that 0-2 cycles or so. Maybe few more cycles for some other CPU models/manufacturers.
Or this value may stuck un-updated for many many cycles sometimes?
Does atomic guarantee that value is propagated between CPU cores as fast as possible for given CPU?
Maybe if instead on 1st thread I do
atomic_value.store(value, std::memory_order_release);
and on 2nd thread
auto value = atomic_value.load(std::memory_order_acquire);
then will it help to propagate value faster? (notice change of both memory orders) And now with speed guarantee? Or it will be same gurantee of speed as for relaxed order?
As a side question - does replacing relaxed order with release+acquire also synchronizes all modifications in other (non-atomic) variables?
Meaning that in 1st thread everything that was written to memory before store-with-release, is this whole memory guaranteed in 2nd thread to be exactly in final state (same as in 1st thread) at point of load-with-acquire, of course in a case if loaded value was new one (updated).
So this means that for ANY type of std::atomic<> (or std::atomic_flag) point of store-with-release in one thread synchronizes all memory writes before it with point in another thread that does load-with-acquire of same atomic, in a case of course if in other thread value of atomic got updated? (Sure if value in 2nd thread is not yet new then we expect that memory writes have not yet finished)
PS. Why question arose... Because according to name "atomic"
it is obvious to conclude (probably miss-conclude) that by default (without extra constraints, i.e. with just relaxed memory order) std::atomic<> just makes any arithmetic operation atomical, and nothing else, no other guarantees about synchronization or speed of propagation. Meaning that write to memory location will be whole (e.g. all 4 bytes at once for int32_t), or exchange with atomic location will do both read-write atomically (actually in a locked fashion), or incrementing a value will do atomically three operations read-add-write.