0

Premise1: Each instruction has a throughput of 1 instruction per clock cycle (obviously factually wrong, but let's assume it).

Can the CPU issue both an ADDPS and an SUBPS on the same clock cycle? Is that behavior the same for integer instructions?

A more extreme example: Can the CPU issue a MULLW, a MULHW and a MULHRSW instruction on the same clock cycle?

Does "Premise2" affect the behavior and if so, how?

  • *Can the CPU issue both an ADDPS and an SUBPS on the same clock cycle? Is that behavior the same for integer instructions?* - Do you want an answer for your simplistic premises, or for real CPUs? On all real CPUs, ADDPS and SUBPS compete for the same FP-add execution unit(s), so the throughput numbers for those separate instructions aren't independent. But on a fake CPU that has a different execution port for every different mnemonic, yes, of course they can issue from the front-end in the same cycle, and later dispatch to execution ports in the same cycle. – Peter Cordes Oct 13 '21 at 05:44
  • 1
    To find out which instructions actually compete with each other, look at uops / ports data, not those silly Throughput numbers; that's not very useful on its own. https://uops.info/ and/or https://agner.org/optimize/. And [What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?](https://stackoverflow.com/q/51607391). Not clear what you're really trying to ask, but seems to be a duplicate of that Q&A. – Peter Cordes Oct 13 '21 at 05:46
  • I want to know it for real, modern CPUs with multiple execution units - thank you. (Edited question) – MrUnbelievable92 Oct 13 '21 at 05:46
  • Then I'd suggest not starting your question with unrealistic premises that looks like you want people to assume. – Peter Cordes Oct 13 '21 at 05:48
  • I don't think that for the 2 examples provided the premises are unrealistic. Premise 1 simplifies stuff so that I don't have to ask whether the CPU can issue 2 ADDPS and 2 SUBPS on the same cycle. Premise 2 hints at the fact that the CPU is capable of performing the Tomasulo Algorithm in hardware, with all of the deriving CPU behavior. – MrUnbelievable92 Oct 13 '21 at 05:54
  • With those premises, the answer is a trivial yes, assuming of course that the instructions are independent (not like `addps xmm0, xmm1` / `subps xmm0, xmm2` with a dep chain through xmm0). Throughput would only be limited by the front-end, and latency, if there are no back-end throughput conflicts between different instructions. Is that what you're asking? The interesting thing is usually which instructions *do* compete for which execution units (or really for same or different EU on the same *port*), but you've removed that factor, giving every mnemonic its own fully-pipelined EU. – Peter Cordes Oct 13 '21 at 06:01
  • The answer is trivially yes if one knows that the throughput metric on it's own doesn't say anything. The question was essentially whether or not instructions like MULLW and MULHW are physically performed by the same piece of hardware - whether or not MULHW/MULHRSW is a byproduct of MULLW or if both are completely independent of each other and whether or not MULHW and MULHRSW compete for the same resources, which becomes interesting once you have 2 MULHW and 2 MULHRSW instructions in the queue. Throughput alone suggests that all 4 instructions are issued in 2 clock cycles. (=> Edited) – MrUnbelievable92 Oct 13 '21 at 06:27
  • uops.info alone as an answer would be something I'd accept ;) – MrUnbelievable92 Oct 13 '21 at 06:28
  • Ok, now without premise 2, your question actually makes sense. A simple statement of it could be "Do two different instructions with the same throughput compete for some shared throughput resource, or are they independent?" Anyway, glad you're satisfied with https://uops.info/ as an answer; we can leave it closed as a duplicate since my answers on the linked questions already discuss uops and link to it. – Peter Cordes Oct 13 '21 at 07:07

0 Answers0