4

For some reason, I can't sample (perf record) hardware cache events:

# perf record -e L1-dcache-stores -a -c 100 -- sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.607 MB perf.data (~26517 samples) ]
# perf script

but I can count them (perf stat):

# perf stat -e L1-dcache-stores -a -- sleep 5
  Performance counter stats for 'sleep 5':

     711,781 L1-dcache-stores                                            

     5.000842990 seconds time elapsed

I tried on different CPUs, OS versions (and kernel versions), perf versions but the result is the same. Is this an expected behaviour? What is the reason? Can't perf warn about this?

ysdx
  • 8,889
  • 1
  • 38
  • 51
  • Do you have root access? – Leeor Nov 14 '14 at 11:14
  • Yes, the scnipped posted in the question were taken as root. It works the same with and without root. – ysdx Nov 14 '14 at 11:52
  • same problem, any updates? – papirrin Dec 24 '14 at 23:40
  • 2
    @papirrin: No, I tried asking on #perf some time ago but there was no one active at that mooment. As a workaround you can try sampling using a a CPU/arch-specific performance event with the syntax `cpu/event=0x40,umask=0x128/u` (and the suitable values of `event` and `umask`). – ysdx Dec 28 '14 at 22:09
  • ysdx, actually there are some events in `perf report` but still no output from `perf script` (tested `L1-dcache-stores -a -c 100` with core i7 and ubuntu 14.10). May be we should try `perf script -D` option to debug perf.data and perf script... – osgx Mar 02 '15 at 04:27

1 Answers1

1

There is a difference in perf evlist -vvv output of three perf.data, one of cache event, second of software event, and last of hw cycles event:

echo '2^234567 %2' | perf record -e L1-dcache-stores -c 100 -o cache bc
echo '2^234567 %2' | perf record -e cycles -c 100 -o cycles bc
echo '2^234567 %2' | perf record -e cs -c 100 -o cs bc

 perf evlist -vvv -i cache
L1-dcache-stores: sample_freq=100, type: 3, config: 256, size: 96, sample_type: IP|TID|TIME, disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
 perf evlist -vvv -i cycles
cycles: sample_freq=100, size: 96, sample_type: IP|TID|TIME, disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
 perf evlist -vvv -i cs
cs: sample_freq=100, type: 1, config: 3, size: 96, sample_type: IP|TID|TIME, disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1

There are different types, and types are defined as

0028 enum perf_type_id {
0029     PERF_TYPE_HARDWARE          = 0,
0030     PERF_TYPE_SOFTWARE          = 1,
0031     PERF_TYPE_TRACEPOINT            = 2,
0032     PERF_TYPE_HW_CACHE          = 3,
0033     PERF_TYPE_RAW               = 4,
0034     PERF_TYPE_BREAKPOINT            = 5,
0035 
0036     PERF_TYPE_MAX,              /* non-ABI */
0037 };

Perf script has a output table which defines how to print event of every kind: http://lxr.free-electrons.com/source/tools/perf/builtin-script.c?v=3.16#L68

 68 /* default set to maintain compatibility with current format */
 69 static struct {
 70         bool user_set;
 71         bool wildcard_set;
 72         unsigned int print_ip_opts;
 73         u64 fields;
 74         u64 invalid_fields;
 75 } output[PERF_TYPE_MAX] = {
 76 
 77         [PERF_TYPE_HARDWARE] = {
 78                 .user_set = false,
 79 
 80                 .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
 81                               PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
 82                               PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
 83                                   PERF_OUTPUT_SYM | PERF_OUTPUT_DSO,
 84 
 85                 .invalid_fields = PERF_OUTPUT_TRACE,
 86         },
 87 
 88         [PERF_TYPE_SOFTWARE] = {
 89                 .user_set = false,
 90 
 91                 .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
 92                               PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
 93                               PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
 94                                   PERF_OUTPUT_SYM | PERF_OUTPUT_DSO,
 95 
 96                 .invalid_fields = PERF_OUTPUT_TRACE,
 97         },
 98 
 99         [PERF_TYPE_TRACEPOINT] = {
100                 .user_set = false,
101 
102                 .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
103                                   PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
104                                   PERF_OUTPUT_EVNAME | PERF_OUTPUT_TRACE,
105         },
106 
107         [PERF_TYPE_RAW] = {
108                 .user_set = false,
109 
110                 .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
111                               PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
112                               PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
113                                   PERF_OUTPUT_SYM | PERF_OUTPUT_DSO,
114 
115                 .invalid_fields = PERF_OUTPUT_TRACE,
116         },
117 };
118 

So, there is no instructions of printing any of field from samples with type 3 - PERF_TYPE_HW_CACHE, and perf script does not print them. We can try to register this type in output array and even push the patch to kernel.

osgx
  • 90,338
  • 53
  • 357
  • 513