1

I am trying to measure the number of times memory references miss any CPU cache and need to fetch a cache line from memory. I have a very simple program that loads 100 million 4-byte integers into an array and then scans it or probes it randomly. I measure time, and then use perf to report various cache-related events: LLC-load, LLC-load-misses, LLC-store, LLC-store-misses. I am using Pop OS 18.10 (a variant of Ubuntu 18.10).

I run the program three ways:

1) Just load the array (100m integers).

2) Load the array and scan in physical order.

3) Load the array and read 100m random array locations.

#3 is 40x slower than #2, which is not surprising.

I am having some trouble both knowing what perf events to examine, and how to interpret the results:

  • I discovered the LLC-* events by googling, but they are not mentioned by "perf list".

  • I subtract the counts of events of the load-only run (#1) from the load-and-scan runs (#2, #3). The numbers are generally lower from the physical scan (#2) compared to the random access (#3). But from reading the documentation, and looking at the numbers, I don't really understand what the various events represent.

  • Does perf count events or does it sample them? If it's a true count, then I really can't make sense of the numbers I'm seeing. (E.g. the number of LLC-load-misses events doesn't match the number of cache line transfers that should be needed.)

Jack Orenstein
  • 169
  • 2
  • 9
  • 2
    Are you using an Intel processor? See: https://stackoverflow.com/questions/55035313/how-does-linux-perf-calculate-the-cache-references-and-cache-misses-events. Note that `perf stat` counts events and `perf record` samples events. – Hadi Brais Mar 09 '19 at 22:43
  • Yes, Intel i7-8565U – Jack Orenstein Mar 20 '19 at 18:29

0 Answers0