2

When profiling a program with "cpu-cycle" event using "pref record -p $pid && perf report" command, I think the underlying hardware PMC does the following things:

  1. Increase the counter when a cycle come
  2. Record the "current instruction" when the counter overflows

I wonder which instruction is recorded as "current instruction" in step-2 since there should be several instructions on the fly in different pipeline stages or execution units.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    For non-PEBS, it generates an interrupt, so the same mechanism applies as when deciding which instructions to retire and which to discard when an external interrupt comes in. I think it finishes the oldest instruction in the ROB, unless maybe it was a high-latency cache miss. ([When an interrupt occurs, what happens to instructions in the pipeline?](//stackoverflow.com/q/8902132) has an answer from Andy Glew, one of the architects who worked on Intel P6.) See also [How does perf record (or other profilers) pick which instruction to count as costing time?](//stackoverflow.com/q/69351189) – Peter Cordes Jun 05 '23 at 05:56
  • 1
    See also [How does a CPU handle asynchronous interrupts?](https://stackoverflow.com/q/70610364) which mentions what we've seen in the past about letting at least one instruction retire before taking an interrupt. [Reliability of Xcode Instrument's disassembly time profiling](https://stackoverflow.com/q/48369347) has some info about "skids", where the counts accumulate for an instruction slightly later than the one that's actually slow. Or for other reasons, for an instruction waiting to read input from something slow that's been sent to an execution unit already. – Peter Cordes Jun 05 '23 at 06:21
  • Does Windows [performance-monitor] actually let you collect hardware PMU events from the [intel-pmu] like Linux [perf] does? If not, maybe replace that tag with [performance] or something. – Peter Cordes Jun 05 '23 at 06:39

0 Answers0