1

I'm profiling using Perf, currently generating this output:

perf stat -C 3 -B  ./my_app

Performance counter stats for 'CPU(s) 3':

     23,191.79 msec cpu-clock                 #    1.000 CPUs utilized          
           800      context-switches          #   34.495 /sec                   
             2      cpu-migrations            #    0.086 /sec                   
         1,098      page-faults               #   47.344 /sec                   
    55,871,690      cycles                    #    0.002 GHz                    
    30,950,148      stalled-cycles-frontend   #   55.40% frontend cycles idle   
    64,157,302      instructions              #    1.15  insn per cycle         
                                              #    0.48  stalled cycles per insn
    12,845,079      branches                  #  553.863 K/sec                  
       227,892      branch-misses             #    1.77% of all branches   

I'd like to add some specific event counters not listed above.

However, when I list them explicitly, I lose the metadata in the right hand column and the default counters all disappear:

 perf stat -e cache-misses -B ./my_app

 Performance counter stats for 'CPU(s) 3':

           207,463      cache-misses                                               

       4.437709174 seconds time elapsed

As you can see, the right-most column has disappeared. I'd like to keep this column, but add specific events.

  1. Is it possible to take the default set of events using -B and add additional events?

  2. If not, if I manually create my list of events, how do I keep the right-most column with the /sec etc?

user997112
  • 29,025
  • 43
  • 182
  • 361
  • 1
    The `/sec` is computed if `task-clock` is one of the events. I don't know of a convenient / short-command-line way to add one extra event, but the simple way is just to include all those events in your list, like `perf stat --all-user -etask-clock,context-switches,cpu-migrations,page-faults,cycles,instructions,uops_issued.any,uops_executed.thread,machine_clears.memory_ordering` as in the examples in [Why does this code execute more slowly after strength-reducing multiplications to loop-carried additions?](//stackoverflow.com/a/72333152) and [mov-elimination](//stackoverflow.com/a/44193770) – Peter Cordes Aug 18 '22 at 02:26

1 Answers1

1

I don't know of a convenient / short-command-line way to add one extra event. The man page doesn't seem to mention one.

I usually include the default events manually in the --event= list.

perf stat --all-user -etask-clock,context-switches,cpu-migrations,page-faults,cycles,instructions,uops_issued.any,uops_executed.thread

You can use -e or --event= more than once, e.g. -etask-clock,instructions,... -e uops_issued.any,uops_executed.thread if that makes editing the command-line easier to easily remove custom events with a control-w instead of having to alt+backspace to kill a word at a time in bash line editing.

See examples in some of my answers, such as the following where I included a perf stat command and actual output.

You can add the events for a "metric group" to the default events with
-M L3_Cache_Access_BW for example, as shown in How to calculate the L3 cache bandwidth by using the performance counters linux?. But not arbitrary single events.

The -d or -dd options can add events to whatever you specified with -e, (e.g. perf stat -e uops_executed.thread,task-clock -dd awk 'BEGIN{for(i=0;i<10000000;i++){}}') but there's no option to add the default events.

On Intel hardware, each core has fixed counters for cycles (clk_unhalted_...) and instructions (inst_retired.any), so always counting those doesn't take away from the number of events you can count with the programmable counters without multiplexing, e.g. 4 on a Skylake with hyperthreading. (perf may not know about that, treating cycles and instructions just like other events. So if it does have to multiplex it may sometimes be counting fewer events than it could be, and thus having a worse duty cycle for some events than it could.) The context-switches and other default events are software events, counted by the kernel not by the PMU, so any number of them can be enabled at once, and don't interact with multiplexing.


Secondary info annotations are just ratios of two events, printed if both are counted.

  • The /sec secondary info is computed if task-clock or duration-time is one of the events. (Related: Run time and reported cycle counts in linux perf re: system-wide counting and/or --all-user or instructions:u leading to low CPU GHz (cycles/second) if not many unhalted clock cycles happened (in user-space) across the CPUs you were counting.)
  • For instructions, the default secondary info is IPC, so it's computed if you also measure cycles.
  • For cache-misses, the secondary info is percent of cache-references. (And no, you don't know which level of cache perf will choose to count with cache-misses, or what event cache-references maps to. These names are super generic.) Similar for other events that count cache misses in specific levels.

The -B option is on by default, and totally orthogonal to all of the event-selection and secondary annotation stuff. It's what uses thousands separators when printing large numbers. Use --no-big-num for the opposite, to get numbers you can copy/paste into a calculator.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847