0

When I using intel vtune to profile a application with memory access mode, some instructions have huge delay in my results, which is shown below.

vtune result shows huge delay of some instructions

Obviously, these two register sub instructions will not occupy such huge delay, furthermore, the sub instructions are defined with memory bound, which is not right in my opinion. I think this is cause by sampling skid, which can be avoid with :pp in perf, can I tune the intel vtune to avoid this just like in perf?

I tried to customize a copy of the selected analysis, and enable "Profile with Full Intel Processor Trace" or "Collect PEBS data", but they are not work.

  • `sub` gets a lot of counts for the "cycles" event because it has to wait for its inputs to be ready, some of which come from loads which maybe missed in cache. It's not just a skid, it's a matter of which instruction gets the blame for something that's slow. – Peter Cordes Jan 31 '23 at 16:55
  • You certainly can't get the exact "delay" of each instruction, that isn't even a thing that makes sense. The CPU is always working on multiple instructions in parallel, so there isn't a single cost associated with each instruction that can be added up. [How many CPU cycles are needed for each assembly instruction?](https://stackoverflow.com/q/692718) / [What considerations go into predicting latency for operations on modern superscalar processors and how can I calculate them by hand?](https://stackoverflow.com/q/51607391) – Peter Cordes Jan 31 '23 at 16:58

0 Answers0