22

I've been reading some material on superscalr and OoO and I am confused.
I think their architecture graphs look very much the same.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
cloudygoose
  • 608
  • 1
  • 6
  • 16
  • 3
    See also: **[Modern Microprocessors: A 90-Minute Guide!](http://www.lighterra.com/papers/modernmicroprocessors/)**. That article builds up from simple pipelining to deeply-pipelined to superscalar, with diagrams and examples. Then moves on to instruction latencies and dependencies, branches (and prediction), and out-of-order execution. (And predication (data dependencies) to replace branches.) Then some discussion of "brainiac vs. speed demon (e.g. Pentium 4)" and why frequency / power scaling killed P4 and why we have multi-cores instead of ever faster single cores. Highly recommended. – Peter Cordes Dec 14 '17 at 10:41

1 Answers1

49

Superscalar microprocessors can execute two or more instructions at the same time. E.g. typically they have at least 2 ALUs (although a superscalar processor might have 1 ALU and some other execution unit, like a shifter or jump unit.)

More precisely, superscalar processors can start executing two or more instructions in the same cycle. Pipelined processors can execute more than one instruction at a time, but a non-superscalar pipelined processor will only start a single instruction in any given cycle. Pipelined execution units take multiple cycles to execute end to end. Put another way: superscalar processors are usually capable of executing two non-pipelined instructions with single cycle latency per cycle, whereas non-superscalar pipelined processors cannot have two single cycle instructions in execution in the ALUs at the same time.

Out-of-order processors can execute instructions out of the original order. For example, in the following, where MULTIPLY takes 5 cycles, instruction 3 may execute before instruction 2 - because instruction 2 is waiting for the 5 cycle result of the MULTIPLY of instruction 1:

1: MULTIPLY reg1 := reg2 * reg3
2: ADD reg4 := reg1 + 5
3: ADD reg6 := reg2 + 1

Most out-of-order processors are also superscalar. However you can imagine building an out-of-order processor that is not superscalar, that can only initiate one operation on a pipelined ALU per cycle. (I have proposed such operations, when employed by Intel, as low power chips. Heck, you can build out-of-order processors that are only half-way scalar, e.g. that have only a 16 bit wide ALU, taking 2 cycles for a 32 bit add, etc. But that's stretching.)

Many superscalar processors, however, are not out-of-order. In the example above, an in-order superscalar would execute instruction 1 first. It would not start instruction 3, but would wait until instruction 2 could start - at which time it would start instruction 2 and 3 together.

Sometimes you have to think about unlikely limit cases, such as 1-wide or half-wide OOO machines, to understand the concepts.

Flow
  • 23,572
  • 15
  • 99
  • 156
Krazy Glew
  • 7,210
  • 2
  • 49
  • 62