Tried looking for the answer to this question in the Intel 64/IA-32, but couldn't find a definitive answer. Questions is: Do memory ordering instructions, such as SFENCE, have effect on the local processor only, or do they spread to the entire cache coherence domain, such as CPUs on a neighboring socket (in a multi-socket system)?
1 Answers
SFENCE
affects the order in which the local CPU's stores become globally visible to other cores on the same and other sockets, or to memory-mapped I/O.
Other cores can't tell whether you ran SFENCE
or not, all they can observe is the order of your memory operations. (i.e. the implementation of sfence
is internal to a core and its store queue).
sfence
was introduced in SSE1, with PIII, before the first multi-core CPUs. At that time, the only SMP systems were multi-socket.
Also note that it only does anything useful with weakly-ordered stores (movnt*
or stores to write-combining memory regions). Normal stores have "release" semantics already on x86. Only mfence
(and lock
ed instructions) matter for normal memory operations on x86, to prevent StoreLoad reordering.

- 328,167
- 45
- 605
- 847
-
Thanks for the quick response. Very helpful. – dsaada May 19 '16 at 11:21
-
@dsaada: if that answers your question, you can use the check-mark under the vote arrows to mark it accepted. – Peter Cordes May 21 '16 at 11:28
-
Thanks. Did that. Just a complementary question - you were saying that other _cores_ aren't aware of fence instructions such as **SFENCE**. Does this apply also to other threads in the same core? – dsaada May 22 '16 at 12:33
-
@dsaada: `MFENCE` is a more interesting example, because it matters even for "normal" stores/loads. As far as observable effects, you can't tell. Two hyperthreads communicating with each other on the same physical core has [a very complicated implementation which preserves all the expected semantics while still competitively sharing L1 cache.](http://stackoverflow.com/questions/32979067/what-will-be-used-for-data-exchange-between-threads-are-executing-on-one-core-wi/32981256#32981256). Within one core, those barriers would have to be tracked by the memory-order buffer or something. – Peter Cordes May 22 '16 at 12:45