How do memory fences/barriers among threads interact with fences/barriers in other threads?

Question

What is the interaction of memory fences in different threads?

More particularly does a memory fence in a thread only prevents the reordering of instructions within the thread or there is there synchronising among the threads like one thread waits until the corresponding fence is reached in another thread? What happens is multiple threads have the same type of fence and multiple threads have paired fences? What type of fences needs to pair together among threads and how are they used? What are the effects if the code with memory fences is run in a single-threaded fashion?

Does this answer your question? [Does memory fencing blocks threads in multi-core CPUs?](https://stackoverflow.com/questions/51809107/does-memory-fencing-blocks-threads-in-multi-core-cpus) — Tsyvarev, Dec 11 '20 at 12:12

score 2 · Answer 1 · answered Dec 08 '20 at 12:26

2

A memory barrier is nothing like the barrier tag.

Memory barriers only order the local core's own accesses to coherent shared memory, because that's all that's needed to be able to recover sequential consistency. There is no direct interaction with other threads / cores.

If you want synchronization between threads, use release/acquire ordering. https://preshing.com/20120913/acquire-and-release-semantics/

answered Dec 08 '20 at 12:26

Peter Cordes

328,167
45
605
847

so hyperthread is outside the scope of this discussion? – grayxu Mar 27 '23 at 16:27
@grayxu: Even if a fence instruction had some effect on other logical cores (like draining their store buffer), you wouldn't know where in their sequence of operations you'd placed a divider. One logical core running a barrier instruction might or might not impact performance of others, but it's not going to be a building-block for correctness. Unlike MPI an `barrier()` function, it doesn't wait for all threads to reach a certain point; there's no low-level instruction that does that even on SMT cores of a single physical core; if you want it you have to implement it in higher-level software – Peter Cordes Mar 27 '23 at 17:18
@grayxu: Since you asked about hyperthreading (Intel's brand name for the concept of SMT), hyperthreading statically partitions the store buffer, so each logical core effectively has its own. [They can't store-forward to each other](https://stackoverflow.com/q/32979067/224132), that would violate the memory model which guarantees all threads can agree on the order of all stores (and that it's an interleaving of program order). Unlike on PowerPC where that is a thing, and even `seq_cst` loads need extra barriers just to [block IRIW reordering](https://stackoverflow.com/q/27807118/224132). – Peter Cordes Mar 27 '23 at 17:22
1

@grayxu: Semi-related [What are the latency and throughput costs of producer-consumer sharing of a memory location between hyper-siblings versus non-hyper siblings?](https://stackoverflow.com/q/45602699) . But anyway, I'm pretty sure x86 full barriers like `lock add` and `mfence` don't slow down the other logical core, and instruction barriers like `lfence` will just stall this logical core's front-end until it's half of the ROB drains (also statically partitioned: https://agner.org/optimize/.) – Peter Cordes Mar 27 '23 at 17:27

How do memory fences/barriers among threads interact with fences/barriers in other threads?

1 Answers1