What is general difference between Superscalar and out-of-order (OoO) execution?

Question

I've been reading some material on superscalr and OoO and I am confused.
I think their architecture graphs look very much the same.

See also: **[Modern Microprocessors: A 90-Minute Guide!](http://www.lighterra.com/papers/modernmicroprocessors/)**. That article builds up from simple pipelining to deeply-pipelined to superscalar, with diagrams and examples. Then moves on to instruction latencies and dependencies, branches (and prediction), and out-of-order execution. (And predication (data dependencies) to replace branches.) Then some discussion of "brainiac vs. speed demon (e.g. Pentium 4)" and why frequency / power scaling killed P4 and why we have multi-cores instead of ever faster single cores. Highly recommended. — Peter Cordes, Dec 14 '17 at 10:41

score 49 · Accepted Answer · edited Jan 09 '20 at 12:58

Superscalar microprocessors can execute two or more instructions at the same time. E.g. typically they have at least 2 ALUs (although a superscalar processor might have 1 ALU and some other execution unit, like a shifter or jump unit.)

More precisely, superscalar processors can start executing two or more instructions in the same cycle. Pipelined processors can execute more than one instruction at a time, but a non-superscalar pipelined processor will only start a single instruction in any given cycle. Pipelined execution units take multiple cycles to execute end to end. Put another way: superscalar processors are usually capable of executing two non-pipelined instructions with single cycle latency per cycle, whereas non-superscalar pipelined processors cannot have two single cycle instructions in execution in the ALUs at the same time.

Out-of-order processors can execute instructions out of the original order. For example, in the following, where MULTIPLY takes 5 cycles, instruction 3 may execute before instruction 2 - because instruction 2 is waiting for the 5 cycle result of the MULTIPLY of instruction 1:

1: MULTIPLY reg1 := reg2 * reg3
2: ADD reg4 := reg1 + 5
3: ADD reg6 := reg2 + 1

Most out-of-order processors are also superscalar. However you can imagine building an out-of-order processor that is not superscalar, that can only initiate one operation on a pipelined ALU per cycle. (I have proposed such operations, when employed by Intel, as low power chips. Heck, you can build out-of-order processors that are only half-way scalar, e.g. that have only a 16 bit wide ALU, taking 2 cycles for a 32 bit add, etc. But that's stretching.)

Many superscalar processors, however, are not out-of-order. In the example above, an in-order superscalar would execute instruction 1 first. It would not start instruction 3, but would wait until instruction 2 could start - at which time it would start instruction 2 and 3 together.

Sometimes you have to think about unlikely limit cases, such as 1-wide or half-wide OOO machines, to understand the concepts.

What is general difference between Superscalar and out-of-order (OoO) execution?

1 Answers1

Linked