0

Asking this question as a pseudo code, and also targeting both rust and c++ as memory model concepts are ditto

SomeFunc(){
    x = counter.load(Ordering::Relaxed)   //#1
    counter.store(x+1, Ordering::Relaxed) //#2
    y = counter.load(Ordering::Relaxed)   //#3
}

Question: Imagine SomeFunc is being executed by a thread and between #2 and #3 the thread gets interrupted and now #3 executes on different core, in this case does counter variable get synchronized with the last updated value (core 1) when it runs on another core2 (there is no explicit release/acquire). I suppose the entire cache line+thread local storage gets shelved and loaded when the thread briefly goes to sleep and comes back running on different core?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Rohit Sharma
  • 6,136
  • 3
  • 28
  • 47
  • 3
    Premption, context switching and other such CPU mechanisms are transparent to C++, as long as you respect synchronization requirements. In the context of a single threaded function call there are no synchronization requirements and switching cores has no observable effect. – François Andrieux Feb 09 '23 at 16:11
  • 1
    This is all within a single thread, so there are no synchronization issues; it will just do what it obviously does. Even if `counter` is just a plain old `int`, not atomic. The CPU takes care of managing context for each thread. It's only when a variable is used by more than one thread that you have to worry about synchronization. – Pete Becker Feb 09 '23 at 16:56

1 Answers1

4

First of all, it should be noted that atomic instructions add synchronization, and do not remove it.

Would you expect:

unsigned func(unsigned* counter) {
    auto x = *counter;
    *counter = x + 1;
    auto y = *counter;
    return y;
}

To return anything else than the original value of *counter + 1?

Yet, similarly, the thread could be moved between cores in-between two statements!

The above code executes fine even when the core is moved because the OS takes care during the switch to appropriately synchronize between cores to preserve user-space program order.

So, what happens when using atomics on a single thread?

Well, you add a bit of processing overhead -- more synchronization -- and the OS still takes care during the switch to appropriately synchronize.

Hence the effect is strictly the same.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • *appropriately manage the caches.* It's not caches that are the problem, it's stuff like the private store buffer inside each core. [Cache is coherent between cores that a single OS is running on and can schedule threads to.](https://stackoverflow.com/a/58535118/224132) What the kernel actually needs is to "take care of synchronization", probably with acquire/release synchronization which it needed for its own kernel data anyway. (Or something stronger for machines like x86 that have special weakly-ordered instructions like `movntps` that aren't ordered by atomic release / acquire.) – Peter Cordes Feb 09 '23 at 17:35
  • So to avoid spreading the common misconception that memory reordering is due to caches, I'd recommend "the OS takes care ... to appropriately synchronize". A better mental model is local reordering of accesses to coherent shared cache, as in https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/. That's why IRIW reordering is so rare; it requires [a microarchitecture that can make stores visible between (logical) cores before they commit to cache](https://stackoverflow.com/a/50679223/224132). I'll edit, you can of course re-edit to whatever you want to say. – Peter Cordes Feb 09 '23 at 17:37
  • 1
    @PeterCordes: That's a good point, I was specifically thinking of the store buffer as I write, and used cache as a generic term for read/write without thinking that in the context of CPUs cache would be understood as referring to L1/L2/L3. Thanks for the edit! – Matthieu M. Feb 09 '23 at 18:37