Reading this answer Can x86 reorder a narrow store with a wider load that fully contains it? I have some questions about to it (sorry, my low reputation does not allow me to add there further comments).
That thread describe a simple program run on a Core in which a store is followed by a load in program order. The Core uses store-to-load forwarding to forward to the following load the content of the store waiting in the store queue (write buffer) to commit to L1D cache
This by itself isn't reordering yet (the load sees the store's data, and they're adjacent in the global order), but it leaves the door open for reordering. The cache line can be invalidated by another core after the load, but before the store commits. A store from another core can become globally visible after our load, but before our store.
So the load includes data from our own store, but not from the other store from another CPU. The other CPU can see the same effect for its load, and thus both threads enter the critical section.
Now my point is: If the cache line was invalidated by another core after our load has executed (i.e. it has returned data from L1D cache), our load could never return the new value of the store executed from another thread that is become global visible after our load but before our store commits in L1D cache, don't you ?