I am trying to measure the number of times memory references miss any CPU cache and need to fetch a cache line from memory. I have a very simple program that loads 100 million 4-byte integers into an array and then scans it or probes it randomly. I measure time, and then use perf to report various cache-related events: LLC-load, LLC-load-misses, LLC-store, LLC-store-misses. I am using Pop OS 18.10 (a variant of Ubuntu 18.10).
I run the program three ways:
1) Just load the array (100m integers).
2) Load the array and scan in physical order.
3) Load the array and read 100m random array locations.
#3 is 40x slower than #2, which is not surprising.
I am having some trouble both knowing what perf events to examine, and how to interpret the results:
I discovered the LLC-* events by googling, but they are not mentioned by "perf list".
I subtract the counts of events of the load-only run (#1) from the load-and-scan runs (#2, #3). The numbers are generally lower from the physical scan (#2) compared to the random access (#3). But from reading the documentation, and looking at the numbers, I don't really understand what the various events represent.
Does perf count events or does it sample them? If it's a true count, then I really can't make sense of the numbers I'm seeing. (E.g. the number of LLC-load-misses events doesn't match the number of cache line transfers that should be needed.)