4

If a Core writes but the cache line is not present in its L1, so it writes to the Store Buffer. Another Core requests that cache line, MESI cannot see the Store Buffer update and returns the unmodified cache line. The Store Buffer is flushed shortly after, but the second Core already uses the older value.

I don't see how an SFENCE solves this problem? Yes the cache line will be updated sooner, but the Core still needs to wait to write the value to L1 and during this time the second Core can request to read?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
intrigued_66
  • 16,082
  • 51
  • 118
  • 189
  • How could you tell this is even happening? How can the two processors distinguish from reading the old value after a newer value has been put in the store buffer, and reading the old value before a newer value has been put into the store buffer? The store buffer, which only exists on certain modern Intel CPUs, doesn't change Intel's memory ordering guarantees. Using SFENCE isn't supposed to change anything in the absence or presence of a store buffer. What actual problem do you think its supposed to solve here? – Ross Ridge Sep 20 '15 at 17:09
  • @RossRidge I thought SFENCE is supposed to help solve the problem of a Core modifying a value but not writing it to its cache line and other Cores receiving an old value? Am I wrong and SFENCE's job is to only ensure a core does not execute multiple instructions on data declared as atomic? If I am wrong, how does a second core receive the Store Buffer value instead of the old value? – intrigued_66 Sep 20 '15 at 17:12
  • 2
    The SFENCE instruction just affects the order of store instructions as they appear to other processors. Since Intel guarantees that normal store instructions appear to other processors to have been issued in the same order they were actually issued, the SFENCE instruction normally doesn't do anything useful. The second core receiving an old value isn't a problem, so long as it only ever sees the old value before the new one. It can't tell the difference between receiving an old value before it was changed on the first core or after. It can only tell if the value changed from old to new. – Ross Ridge Sep 20 '15 at 17:32
  • 2
    SFENCE is a no-op unless you've been using `movnt` stores. Normal stores on x86 are guaranteed to appear in-order (eventually) on other cores. MFENCE is the only barrier that doesn't happen for free in the strongly-ordered x86 memory model. See http://stackoverflow.com/a/32394427/224132. To answer you more directly, unless you use `movnt`, data from normal writes *can't* just go into a store buffer if it misses in cache. The store can't retire until the core owns the cache line and puts it into the `M` (Modified) state of the MESI protocol. – Peter Cordes Sep 20 '15 at 21:20
  • @RossRidge If SFENCE is redundant because the intel memory model guarantees Stores are not re-ordered with older stores, why isn't LFENCE redundant too because the model also guarantees loads cant be re-ordered with other loads? – intrigued_66 Sep 21 '15 at 21:50
  • 2
    @mezamorphic LFENCE is also unnecessary to order loads performed by normal load instructions. – Ross Ridge Sep 21 '15 at 23:08
  • @mezamorphic: http://stackoverflow.com/questions/32705169/does-the-intel-memory-model-make-sfence-and-lfence-redundant – Peter Cordes Sep 28 '15 at 10:26

2 Answers2

6

No, it doesn't prevent the core from "hiding" the stores from MESI (perhaps better called something like the cache coherent domain). In fact, as pointed out in the comments to the OP, SFENCE has no effect on normal x86 stores which are already strongly ordered. It is only useful to put a fence between stores at least one of which is an NT store, or a store to WC memory, etc.

The "hiding" here isn't really problematic. The x86 has a "total store" order in which there is a single global order of stores that is observed by most operations. This order is basically the order at which stores leave the store buffer (are committed to L1). It is not the order at which they enter the buffer, or even the order in which the stores retire. So when a store is still in the store buffer, it effectively hasn't occurred in the total store order, and is invisible in the cache coherent domain.

The only way this causes reordering (on x86), is that this allows later loads to apparently pass earlier stores: a later load reads from the "global order" when it executes (e.g., hits in L1), but an earlier store may still be sitting in the store buffer, which (as above) means it hasn't become part of the global order yet. To prevent that reordering would be performance prohibitive, but all the other orderings are prevented just by keeping things in order (load-load, and store-store) and some other mechanism which ensures later stores don't get committed until earlier loads have completed.

If you want to "solve" the store buffer problem, then mfence is your solution. It effectively flushes the store buffer before proceeding.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • 1
    Loads actually read from L1D when they *execute*. They become non-speculative when they retire (by definition), but the value is from earlier, when they executed. (I guess technically the value is read sometime between being dispatched to an execution port and retirement, since AGU calculation takes a cycle or two, and they can miss in L1D and sit in a load buffer slot waiting for the value to arrive.) – Peter Cordes Sep 08 '17 at 19:47
0

As stated in the previous answers, stores will (eventually) become globally visible on other cores in the order they're issued (program order). 'Eventually' is the key, as SFENCE enforces a literal fence on the cycle when the store buffer is drained and the writes on the buffer are made 'globally visibe'.

So, yes, SFENCE instructions cause data in the store buffer to be drained to the cache. This is explained in Section 11.10 of the software developer's manual (SDM).

The SFENCE instruction is also described as:

This serializing operation guarantees that every store instruction that precedes the SFENCE instruction in program order becomes globally visible before any store instruction that follows the SFENCE instruction.

Reads passing writes in the buffer is irrelevant in this context.

MCG
  • 81
  • 9