2

I’m analyzing some weird program run time behaviour with perf, with some help on IRC. By default perf stat only lists a few counters, and not those of interest, so there is an annoying ping-pong of “include this counter in the output” and me adding it to the list of arguments passed via -e and pasting the result.

Is there a way to make perf stat simply emit all counters, so that one report is all the experts need to help me?

Joachim Breitner
  • 25,395
  • 6
  • 78
  • 139
  • This would also have helped http://stackoverflow.com/questions/14674463/why-doesnt-perf-report-cache-misses I guess. – Joachim Breitner Oct 01 '15 at 10:03
  • Give [*this a try.*](http://stackoverflow.com/a/378024/23771) The fact that you think CPU event counters are going to tell you anything useful at all means maybe you need a different way to look at it. – Mike Dunlavey Oct 01 '15 at 14:48
  • So, after the nominal report, you conclude these events are not interesting. You can't use all counters, but even if it was the case, I think you would be flooded by too much information. You should first find if your program uses too much CPU, if it has cache or TLB related problem, or it spends time waiting for information from IO (does it heavily access to the filesystem for example?). Does it badly manage memory allocations? Maybe try valgrind and its `massif` tool. – amigadev Oct 02 '15 at 07:46
  • 1
    Joachim Breitner, for Intel CPU there is [pmu-tools](https://github.com/andikleen/pmu-tools) open-source project from Intel, capable of encoding, listing and selecting many hw events of modern CPU (there is also good toplev.py script with many **useful** predefined sets). There is also `showevtinfo` tool in `perfmon2` / `libpfm4` project which will list lot events for several AMD/Intel CPUs: http://www.bnikolic.co.uk/blog/hpc-prof-events.html "How to monitor the full range of CPU performance events" ("Total events: 2332 available, 166 supported"), but you should not them pass all to stat. – osgx Jan 09 '16 at 04:57

1 Answers1

2

Short answer: no.

Rationale: The performance monitor counter (PMU) unit of a CPU is implemented by a number of additional registers so, that, for some chosen event the designed register is incremented. Now, the number of registers is limited, because adding register to CPU is very "costly" thing. So, there are many more events then available PMU registers to count them.

Bottom line, you have to choose the subset of CPU events to monitor using CPU PMU unit.

KostaZ
  • 710
  • 1
  • 7
  • 17
  • That’s what I thought at first as well, but according to [the perf wiki](https://perf.wiki.kernel.org/index.php/Tutorial#multiple_events), “There is no theoretical limit in terms of the number of events that can be provided. If there are more events than there are actual hw counters, the kernel will automatically multiplex them. There is no limit of the number of software events. It is possible to simultaneously measure events coming from different sources.”, so it might not be true. – Joachim Breitner Oct 01 '15 at 12:05
  • Joachim Breitner, there is `perf stat -d` variant which trys to enable around 10-12 events. But modern hardware provide simultaneous access only to up to 7 hw events; 10 events are multiplexed by perf. In my test any **multiplexing greatly disturb results**, and I conclude that I will not use multiplexing by perf... There are hundreds of hw events in modern x86 CPUs, and you should start from the basic sets (these selected by perf stat, or perf stat -d, but run parts of them with -e ... without multiplexing in groups of 5-7 - when there is no [25%], [50%] or [75%] output in stat) – osgx Jan 09 '16 at 05:03