As known Intel x86_64 processors are not only pipelined architecture, but also superscalar.
This is mean that CPU can:
Pipeline - At one clock, execute some stages of one operation. For example, two ADDs in parallel with shifting of stages:
- ADD(stage1) -> ADD(stage2) -> nothing
- nothing -> ADD(stage1) -> ADD(stage2)
Superscalar - At one clock, execute some different operations. For example, ADD and MUL in parallel in the same stages:
- ADD(stage1) -> ADD(stage2)
- MUL(stage1) -> MUL(stage2)
This is possible due to the fact that the processor has several schedulers of instructions (Intel Core have 4 Simple Decoder).
But are there only duplicates of schedulers (4 Simple Decoders), or also are there duplicates of arithmetic unit?
I.e. can we execute, for example, two ADDs in the same stages, but on the independent arithmetic units (for example, ALU on Port 0 and ALU on Port 1) on the same CPU-Core?
- ADD1(stage1) -> ADD1(stage2)
- ADD2(stage1) -> ADD2(stage2)
Are there duplicates of the any executing unit which make able to execute two the same instructions at the same one clock?