I'm trying to understand the execution flow of uops after they leave Front-End and before they are dispatched into an appropriate execution port.
I currently have the following mental model of it:
Front-End (Fetch, Decode, Micro/Macro-Fuse)
|
|
Renamer (Detects dependency chains,
| Eliminate RAR, WAW hazards,
| Allocates resources like SB or LB,
| Binds the uop to some execution port)
|
|
|______ROB (Holds uop until it is fully completed)
|
|
|
RS aka Scheduler (Wait until all source operands are ready)
|
Execution ports
The thing that I do not understand is how Intel Optimization Manual/2.5.3.1
describes it:
The Renamer is the bridge between the in-order part in Figure 2-10, and the dataflow world of the Scheduler. It moves up to four micro-ops every cycle from the micro-op queue to the out-of-order engine. Although the renamer can send up to 4 micro-ops (unfused, micro-fused, or macro-fused) per cycle, this is equivalent to the issue port can dispatch six micro-ops per cycle. In this process, the out-of-order core carries out the following steps:
Renames architectural sources and destinations of the micro-ops to micro-architectural sources and destinations.
Allocates resources to the micro-ops. For example, load or store buffers.
Binds the micro-op to an appropriate dispatch port.
But the Scheduler
(aka RS) is described as follows at 2.5.3.2
:
The scheduler controls the dispatch of micro-ops onto their execution ports. In order to do this, it must identify which micro-ops are ready and where its sources come from: a register file entry, or a bypass directly from an execution unit. Depending on the availability of dispatch ports and writeback buses, and the priority of ready micro-ops, the scheduler selects which micro-ops are dispatched every cycle.
QUESTION: Is the port to dispatch an uop into selected by the Renamer, but the dispatching to the port selected by the Renamer
is done by the RS (aka Scheduler) as soon as all operands are ready?
I measured port distribution for memory copy routine based on vmovdqu
and got almost uniform distribution:
13 493 383 038 uops_dispatched_port.port_2
13 494 860 751 uops_dispatched_port.port_3
This is not clear how it is achieved simply by the Renamer. It does not know when all operands for the uop will become ready so it is difficult to choose which port to dispatch uop into to achieve uniform uops distribution.