My question is if there are more instructions that are issued in a cycle than are able to get executed by functional units, what algorithm does the hardware use to decide which instruction get executed first?
For example, looking at the zen 2 architecture here let’s say in a cycle we have 4 instructions that are sent from the dispatched to the floating point unit. Also let’s assume there are no data dependencies in these 4 instructions. Now if these instructions can only be executed by 2 of the 4 function units in the floating point unit, how does the hardware determine which instructions to execute first in the scheduling queue?