0

Currently reading an article about memory barriers (in MESI) and I have few questions about that. There are 4 types of memory barriers LoadLoad, StoreStore, LoadStore and StoreLoad. I understand what LoadLoad and StoreStore do.

For example LoadLoad its a read barrier (on Linux smp_rmb) which makes core to handle all invalidate requests from invalidate queue. Same about StoreStore, it makes core to wait for all store requests.

But what LoadStore and StoreLoad do? Isnt it the same as LoadLoad and StoreStore? If LoadStore makes core to finish all load operations before store ones , does core make it in another way than handling invalidation queue?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    Barriers are always purely local, making *this* core wait (or delay *some* kinds of operations), not directly affecting other cores at all. E.g. a StoreLoad barrier drains the store buffer before later loads. (And is usually part of a full barrier). https://preshing.com/20120710/memory-barriers-are-like-source-control-operations/ / https://preshing.com/20130922/acquire-and-release-fences/ – Peter Cordes Apr 09 '21 at 01:05
  • @PeterCordes But that is not the answer to my question, not even close – questionmark Apr 09 '21 at 05:55
  • I think I misread one of your sentences when I was commenting, I thought there was some implication of a barrier making other / all cores wait. But I don't see that now. Still, the links to Jeff Preshing's articles are highly relevant. – Peter Cordes Apr 09 '21 at 06:00
  • Also related: [Does a memory barrier acts both as a marker and as an instruction?](https://stackoverflow.com/q/50338253) has some stuff about how memory barriers are internally implemented. Also [Why is (or isn't?) SFENCE + LFENCE equivalent to MFENCE?](https://stackoverflow.com/q/27627969) covers x86 barriers. But being a strongly-ordered ISA, only `mfence` (and `lock`ed operations) are relevant for normal stores; only weakly-ordered stores like movnt make `sfence` useful, and lfence is mostly only useful for serializing execution. – Peter Cordes Apr 09 '21 at 06:04
  • @PeterCordes thank you for such an interesting article. Now I can't understand the difference between Load-Store(4 types) memory barriers and Linux memory barriers,such as smp_*mb.There is no info(in the article)about store buffers ,invalidation queue and so on.It is only about operations reordering which is quite easy to understand – questionmark Apr 09 '21 at 08:31
  • Pretty obviously, a StoreLoad barrier involves blocking loads from executing until the store buffer has drained. Other barrier types don't need to drain it, that's why they're all (much) cheaper. An "invalidation queue" is not AFAIK a real thing in CPU architecture. It doesn't seem to be a necessary part of a valid(?) mental model, either. Although possibly for understanding PowerPC; I certainly don't claim to fully understand PPC, just x86 (program order + store buffer with store forward) and I think I mostly understand models like AArch64 which are multi-copy atomic. – Peter Cordes Apr 09 '21 at 08:40
  • `smb_mb` is a full barrier, and I think `smb_rmb` = acquire fence, `smb_wmb` = release fence. (See preshing's article). – Peter Cordes Apr 09 '21 at 08:40
  • @PeterCordes ok,got it. But what about invalidation queue? It is not real thing, so how Load_* barriers work? If it was real such barriers would make this queue to commit to cache , but without it,what happens? – questionmark Apr 09 '21 at 08:45
  • Block later loads from executing until earlier loads have all completed. Many mechanisms are possible for an out-of-order exec CPU, from simply stalling the front-end until then (slow but simple), or do the blocking in the scheduler or load execution units with more fine-grained tracking. On CPUs that force loads to complete before they can retire, you get LoadStore for free, otherwise a barrier could mark any existing in-flight loads as needing to wait for results before they can retire. (The store buffer takes care of making sure the stores happen after retirement as always.) – Peter Cordes Apr 09 '21 at 08:55
  • See also [How is load->store reordering possible with in-order commit?](https://stackoverflow.com/a/52215515) re: in-order vs. out-of-order exec, and the fact that some real-world CPUs don't force loads to complete (data arrival from a cache miss) before they can retire, even OoO exec. (x86 CPUs do, as part of ensuring their strong memory model, even having to roll back if speculative early loads turn out not to be architecturally allowed.) – Peter Cordes Apr 09 '21 at 08:56

0 Answers0