2

I have question related to this one.

I want to (programatically) measure L3 Hits (Accesses) and Misses on an AMD EPYC 7742 CPU (Zen2). I run Linux Kernel 5.4.0-66-generic on Ubuntu Server 20.04.2 LTS. According to the question linked above, the events rFF04 (L3LookupState) and r0106 (L3CombClstrState) should represent the L3 accesses and misses, respectively. Furthermore, Kernel 5.4 should support these events.

However, when measuring it with perf, I run into issues. Similar to the question linked above, if I run numactl -C 0 -m 0 perf stat -e instructions,cycles,r0106,rFF04 ./benchmark, I only measure 0 values. If I try to use numactl -C 0 -m 0 perf stat -e instructions,cycles,amd_l3/r8001/,amd_l3/r0106/, perf complains about "unknown terms". If I use the perf event names, i.e. numactl -C 0 -m 0 perf stat -e instructions,cycles,l3_request_g1.caching_l3_cache_accesses, l3_comb_clstr_state.request_miss perf outputs <not supported> for these events.

Furthermore, I actually want to measure this using perf's C API. Currently, I dispatch a perf_event_attr with type PERF_TYPE_RAW and config set to, e.g., 0x8001. How do I get the amd_l3 PMU stuff into my perf_event_attr object? Otherwise, it would be equivalent to numactl -C 0 -m 0 perf stat -e instructions,cycles,r0106,rFF04 ./benchmark, which is measuring undefined values.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Maxbit
  • 439
  • 5
  • 12

1 Answers1

2

Short answer: Try -e rFF0F00000040FF04 parameter which is shown in your CPU PPR doc.

Detailed:

Maybe I can help you with the first problem which is said in your 3rd paragraph. The second which is said in 4th paragraph, I can't. Sorry.

Since your cpu is 'Family 23 Model 49', then I refered to '17h model 31h' amd PPR doc. It says use L3Event[0xFF0F00000040FF04] for 'L3 Accesses ' (0xFF0F00000040FF04 is 64bits which is same as L3 Performance Event Select width as amd doc shows). Also, the man perf-list also shows AMD uses this format where it has '32-35' bits. Although in the PPR doc, the L3PMCx04 doesn't have much information, the doc has some useful infos located in L3 Performance Event Select.

I used my cpu ryzen 7 4800h which is 17h_60h family Renoir processor (It is also zen2. From these two source code one which lists some encodings for the AMD CPU and two, zen2's config should be almost same.) which don't have amd_l3 support, here I used ls_dc_accesses as the representation and 729 is the code of All DC Accesses in my cpu family amd doc where PMCx029 represents the EventCode and 8 corresponding bits represent UMask. It can be also found in the above code two (in your 17h_31h family PPR doc p182, the number is 0x430729):

$ ls /sys/devices/*/format | grep amd_l
$ perf list | grep ls_dc_accesses -A 1
  ls_dc_accesses
       [Number of accesses to the dcache for load/store references]
$ perf stat -e r729 ls          
...
 Performance counter stats for 'ls':

         1,097,092      r729                       
$ perf stat -e ls_dc_accesses ls
...
 Performance counter stats for 'ls':

           974,666      ls_dc_accesses   

And not everyone has one epyc cpu, so it may be not convenient to see where it goes wrong with your problem. Maybe you can offer more valuable information if possible.

Hope this can help you.

zg c
  • 113
  • 1
  • 1
  • 7