3

In my understanding, CPU changes the operations order which are written on machine code for optimization and it is called out-of-order execution.

In the term "memory order", it defines the order of accessing to the memory. For example, in relaxed order, it defines very weak ordering rules and execution reordering is easy to happen.
There are some memory ordering models like TSO in x86. In such memory ordering model, the semantics of memory access order by the processor is defined.

What I don't understand is the relationship of them. Is memory order a kind of out of order execution and are there any other ways for OoOe?
Or, is memory order the implementation of out of order execution and all the reorders by processors are based on the semantics?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
hidetatz
  • 195
  • 8

1 Answers1

4

The general issue is that on a modern multiprocessor system, load and store instructions may become visible to other cores in a different order than program order. Out-of-order execution is one way in which this can happen, but there are others.

For instance, you could have a CPU which executes and retires all instructions in strict program order, but when it does a store instruction, instead of committing it to L1 cache immediately, it puts it in a store buffer to be written to cache later. The store buffer could be designed to write out stores in a different order than they came in; for instance, if a first store misses L1 cache but a second one would hit, you could save time by writing out the second one while waiting for the first one's cache line to load.

Or, even if the store buffer doesn't reorder, you could have a situation where, while a store is still waiting in the store buffer, the CPU executes a load instruction that came later in program order. Other cores will thus see the load happening before the store. This is the situation with x86, for instance.

The memory ordering model defines, in an abstract way, what the programmer is entitled to expect about the order in which loads and stores become visible to other cores (or hardware, etc). It also usually specifies how the programmer can gain stronger guarantees when needed (e.g. by executing barrier instructions). The CPU then has to be designed to provide the defined behavior, which may place constraints on the features it can include. For instance, if the architecture promises TSO, the CPU probably can't include a store buffer that's capable of reordering, unless they manage to do it in such a clever way that the reordering can never be noticed by other cores.

Related questions:

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82
  • Thank you. I still want to clarify one thing: You wrote "Out-of-order execution is one way in which this can happen, but there are others.", but does this mean out-of-order execution is not related to memory reorder (using store buffer or so on)? Because wikipedia says "In modern microprocessors, memory ordering characterizes the CPU's ability to reorder memory operations – it is a type of out-of-order execution." so I'm confused. https://en.wikipedia.org/wiki/Memory_ordering – hidetatz Jan 18 '22 at 03:17
  • 1
    @hidetatz: It might be a matter of semantics. I would say that strictly speaking, writing to memory or cache is not part of the *execution* of the store instruction, in the usual [pipeline stage terminology](https://en.wikipedia.org/wiki/Classic_RISC_pipeline). The execution is complete once the instruction has computed the data to be stored and the destination address, resolved any possible exceptions, and put the data and address into the store buffer, even though the actual write to cache may happen later. [...] – Nate Eldredge Jan 18 '22 at 03:37
  • @hidetatz: Right, memory ordering is separate from execution ordering. For example, modern Intel CPUs *speculatively* execute loads out of order, but rewind if they discover that everything would not have gone **as if** the loads happened in an order allowed by the ISA memory model. (Program order + a store buffer with store forwarding: [Can a speculatively executed CPU branch contain opcodes that access RAM?](https://stackoverflow.com/q/64141366) ) As Nate says, a store buffer is basically necessary to insulate other cores from speculative execution of stores. – Peter Cordes Jan 18 '22 at 03:37
  • @hidetatz: So from that standpoint, the Wikipedia quote is not quite accurate. But they may also mean "execution" in the more informal sense of "stuff that the CPU does", and it's true that something like a store buffer results in the CPU performing some of its functions (e.g. memory access) in a different order than program order. In that sense it's doing *something* out of order. – Nate Eldredge Jan 18 '22 at 03:39
  • @Nate Thank you. Now I understand well that load/store is not really the same as ooo execution. – hidetatz Jan 18 '22 at 04:28
  • @PeterCordes: Thank you. The speculative execution is good example to me and it makes it easy to understand things for me. – hidetatz Jan 18 '22 at 04:29