When I using intel vtune to profile a application with memory access mode, some instructions have huge delay in my results, which is shown below.
vtune result shows huge delay of some instructions
Obviously, these two register sub instructions will not occupy such huge delay, furthermore, the sub instructions are defined with memory bound, which is not right in my opinion. I think this is cause by sampling skid, which can be avoid with :pp in perf, can I tune the intel vtune to avoid this just like in perf?
I tried to customize a copy of the selected analysis, and enable "Profile with Full Intel Processor Trace" or "Collect PEBS data", but they are not work.