0

How can i find out about the values of dispatch width (front-end pipeline width) and out-of-order window size a cpu is using?

Specifically for x86, on an Ubuntu 16.04 system.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    Do you mean the width of the stage that copies uops from the front-end into the back-end? Intel calls that "issue", many others call that "dispatch". You can usually measure that by testing the throughput of `nop` instructions, assuming it's the narrowest part of the pipeline like on modern x86. But if you're using Intel terminology, the "dispatch" is sending uops from the back-end scheduler to execution units. That's hard to measure because it's wider than anything the front-end can keep up with, on Intel CPUs. (AFAIK, every execution port can accept a new uop in the same cycle). – Peter Cordes Jun 04 '20 at 11:21
  • 1
    You can get a loop to sustain 7 back-end uops per clock on Skylake: https://www.agner.org/optimize/blog/read.php?i=415#857, out of the 8 ports. Anyway, are you asking about x86? Or are you hoping for some ISA-independent thing that works on any port of Ubuntu, like ARM, AArch64, x86, POWER, etc. https://help.ubuntu.com/community/SupportedArchitectures. That seems unlikely, you'd need custom assembly language on every ISA. – Peter Cordes Jun 04 '20 at 11:26
  • 1
    re: out-of-order window size: [Understanding the impact of lfence on a loop with two long dependency chains, for increasing lengths](https://stackoverflow.com/q/51986046). Also http://blog.stuffedcow.net/2013/05/measuring-rob-capacity/ – Peter Cordes Jun 04 '20 at 11:27
  • 1
    If you only care about modern x86, it's easiest to look up known values based on CPUID model number. I don't think these pipeline details are actually queryable with CPUID, so you'd need a table, but official numbers have been published by vendors and usually mostly confirmed by experiments. – Peter Cordes Jun 04 '20 at 11:34
  • Yes i meant what Intel calls "issue", though i didn't know they use different terminology. And yes was asking about x86. Hadn't realized that these widths are kept "secret" in a way and you have to find ways to measure them .Will look for the tables you mentioned. Anyway, thank you very much, the sources were very informative. – user7375077 Jun 05 '20 at 16:47
  • They're not "secret" like brach-predictor internals are, they're just not programmatically queryable, AFAIK. Intel and AMD do get pipeline info out there either in their optimization manuals or via presentations about how good their CPUs are. Most software isn't self-tuning, and most of the time there aren't many tuning decisions you'd make differently based on these factors, even for a JIT compiler or ahead-of-time with `gcc -march=native` – Peter Cordes Jun 05 '20 at 16:53

1 Answers1

0

Searching online I found this presentation.

Geo Angelopoulos
  • 1,037
  • 11
  • 18
  • Sure, for any known microarchitecture, you can just look up published / known facts, like that Intel from Core 2 to Skylake is 4-wide, and 5-wide from Ice Lake onwards. And yes, Skylake's ROB size is 224 uops, RS size 97 uops (unfused domain), and integer vs. FP register file sizes. Similar info for Haswell in https://www.realworldtech.com/haswell-cpu/. However, it seems that Skylake's scheduler (RS) isn't flat / totally unified. Not all 97 uops can be filled up with `add` uops, for example. (BeeOnRope has done some testing of this, or at least commented about it). – Peter Cordes Jun 10 '20 at 20:19
  • 2
    But anyway, the question was how to *measure* this, not look it up. Info for all modern mainstream x86 uarch is fairly well known. – Peter Cordes Jun 10 '20 at 20:21