8

I've recently read about CPU instruction reordering for efficiency. But I'm not able to understand how CPU reorders its instructions. I mean compile time reordering is thinkable since the compiler can foresee the upcoming code. But for a CPU which reads instruction one after the other, how does it see upcoming instructions to reorder them

ffff
  • 2,853
  • 1
  • 25
  • 44
  • 2
    The same way you see upcoming words when you're reading a sentence: by looking a little bit ahead. – hobbs Aug 21 '16 at 08:03
  • 1
    It reads faster than it can execute, so it can see a window of upcoming instructions. For details, see some of the links in the [x86 tag wiki](http://stackoverflow.com/tags/x86/info), like Agner Fog's microarch pdf, and also [David Kanter's writeup of Intel's Haswell design](http://www.realworldtech.com/haswell-cpu/). Of course, if you had simply googled "out of order execution", you'd find [the wikipedia article](https://en.wikipedia.org/wiki/Out-of-order_execution), which you should read. – Peter Cordes Aug 21 '16 at 08:04
  • I mean, I don't know much about assembler code, but from what I do remember, each instruction is basically in memory, so if, as said above, the CPU can read faster than it can execute, it can just follow the memory location of the instruction, and look forward. This isn't a fact, it's just something I assume, I haven't thought of it myself before either. – Giora Guttsait Aug 21 '16 at 08:07
  • I wrote [an answer on another question](https://softwareengineering.stackexchange.com/questions/349972/how-does-a-single-thread-run-on-multiple-cores/350024#350024) with details about how a modern x86 CPU core like Haswell finds and exploits instruction-level parallelism with out-of-order execution. – Peter Cordes May 14 '18 at 20:13

2 Answers2

9

Instructions are fetched in program order into an instruction queue; from the queue they are decoded and moved into reservation stations. These reservation stations effectively do the reordering: instructions are dispatched for execution to execution units as their arguments become available and the time all the arguments become available generally does not correspond to the order in the instruction queue/memory.

For an example, using the Tomasulo Algorithm, see these two videos:

Issue (and register renaming): https://youtu.be/I2qMY0XvYHA?list=PLAwxTw4SYaPkNw98-MFodLzKgi6bYGjZs

Dispatch/reordering: https://youtu.be/bEB7sZTP8zc?list=PLAwxTw4SYaPkNw98-MFodLzKgi6bYGjZs

chill
  • 16,470
  • 2
  • 40
  • 44
6

The instructions are decoded in order, but they then go into a collection of "in progress" instructions. Instructions can make forward progress if their dependencies are met.

For example, say the instructions are:

  1. Load register A from memory.
  2. Load register B from memory.
  3. Increment register A.
  4. Increment register B.

It may be that the last two instructions are in progress at the same time and if the memory read for register B completes first (maybe it was already in the L1 cache), then the increment of register B will take place before the increment of register A. (Though, of course, after that instruction is decoded.)

David Schwartz
  • 179,497
  • 17
  • 214
  • 278