1

I want to know if a CPU core can do multiple x86 comparison and add operations at once in parallel.

So if I wrote something like

Compare X y
Compare y z
Add X y
Add q p

Would the compares run at the same time? Would the adds run at the same time?

phuclv
  • 37,963
  • 15
  • 156
  • 475
Fábio Santos
  • 3,899
  • 1
  • 26
  • 31
  • 1
    See also [How does a single thread run on multiple cores?](//softwareengineering.stackexchange.com/a/350024) (it doesn't, a single core finds the parallelism and runs instructions in parallel if they're independent.) My answer there explains some of how that works. – Peter Cordes Jan 26 '19 at 21:53

1 Answers1

5

Yes, provided that the CPU has 2 free comparers/2 adders, the data are independent of each other and the instruction dispatcher can deliver to both units at the same time. That's how superscalar CPUs work. All x86 CPUs from P5 Pentium as well as all modern CPUs are superscalar.

That's one of the reasons newer microarchitectures run faster than older ones even though clock cycle and the number of cores are still the same, because they can have more execution units, bigger units, bigger cache, run more instructions at the same time along with numerous other improvements. For example

Current x86 processors can deliver 3 instructions per clock cycle. Conroe, however, has been architected to fetch, dispatch, execute and retire up to four full instructions simultaneously, offering a 33% boost over, say, a Pentium 4 CPU

https://hexus.net/tech/tech-explained/cpu/17976-intel-core-2-duo-conroe/

It's hard to define a clear value due to the use of micro-op and macro-fusion. However for some rough comparison, Sandy Bridge is able to achieve 6 micro-ops per cycle. e.g. it can sustain 1 iteration per clock throughput for a loop consisting of two add instructions with memory sources, a multiply, and a macro-fused dec-and-branch.

See also

Community
  • 1
  • 1
phuclv
  • 37,963
  • 15
  • 156
  • 475
  • The narrowest point in SnB-family's pipeline (issue/rename) is 4 micro-fused uops wide. But yes, SnB can sustain 6 unfused-domain uops per clock, while SKL can sustain 7. Intel never uses the term "macro-ops", that's what AMD calls their ops on Bulldozer-family. (Especially compared to Pentium 4, AMD CPUs decoded most instructions to fewer ops than Intel, so they called the "macro-ops" instead of "micro-ops" to promote the fact that their CPUs got more done with each op and each cycle back in the P4 days. I think with Ryzen, AMD is calling the micro-ops too.) – Peter Cordes Jan 26 '19 at 21:51