0

While I was studying computer organization, we talked about data dependencies and how they limit the throughput of the pipeline because the execution of one instruction is blocked by another instruction not being done.

In modern processors, is it still the case? Is it possible to create a scenario in a real-world program where the CPU has enough data (it is not waiting for data from the memory), but due to data dependencies it is not running at full speed (maximum instructions per cycle)?

I believe that the compilers will try to break chains of dependencies. Are there cases when this is not possible?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Bogi
  • 2,274
  • 5
  • 26
  • 34
  • 1
    Yes. And yes, much to the chagrin of the Itanium engineers. Decent article is [here](https://learn.microsoft.com/en-us/archive/msdn-magazine/2015/may/compilers-what-every-programmer-should-know-about-compiler-optimizations-part-2), scroll to "Instruction Scheduling" – Hans Passant Sep 02 '20 at 21:32
  • FP reductions are a classic example of this because of the higher latency of FP ops: [Why does mulss take only 3 cycles on Haswell, different from Agner's instruction tables? (Unrolling FP loops with multiple accumulators)](https://stackoverflow.com/q/45113527). If you don't unroll with multiple accumulators to hide latency at compile time, there isn't much ILP to find at run-time. Strict FP rules are a common reason for compilers not doing this for you, because FP math is not strictly associative. – Peter Cordes Sep 03 '20 at 01:26
  • Also it's common in general for code to achieve less than max IPC on most code. Often because of stalls from cache misses and branch misses, but even between those there's limited parallelism in some code. Especially pointer-chasing (walking a linked list) is bad for loop-carried dependency, or the data or control dependency inherent in a binary search. – Peter Cordes Sep 03 '20 at 01:29

0 Answers0