4

I am running ocount on our program to count L2 cache read events, and we have these result:

Event                               Count                    % time    
counted
l2_rqsts:all_demand_data_rd         14,418,959,276           80.01
l2_rqsts:demand_data_rd_hit         6,297,000,387            80.00
l2_rqsts:demand_data_rd_miss        6,104,577,343            80.00
l2_rqsts:l2_pf_hit                  667,709,870              80.01
l2_rqsts:l2_pf_miss                 1,641,991,158            79.99

However we have no idea if these results should be considered as total cache trashing or not.

What do you consider a good ratio hit/miss ration for L2 cache?

I expect it highly depends on the CPU architecture and the application requirements but is there a general admissible value for it?

rvlander
  • 91
  • 2
  • 7
  • By itself, a cache hit/miss doesn't really tell you anything other than potential optimizations. – Jason Nov 19 '15 at 21:53
  • Then, how do you know that cache misses are the bottleneck of your app? – rvlander Nov 20 '15 at 09:48
  • Cache hit/miss doesn't tell you what types of cache misses you have. There's more than one (compulsory, capacity, conflict, etc...). – Jason Nov 20 '15 at 14:39
  • @rvlander - For a high-level bottleneck analysis you can use profiling tools from gprof to vtune. The TopDown methodology based on performance counters could also be useful (not sure if it's supported in oprofile) – Leeor Nov 27 '15 at 20:45
  • It depends very much on the app, e.g. I'd highly recommend a persistent object cache for WordPress (you could use memcached with the memcached-redux plugin) - I typically get a 95%+ hit rate for that. You obviously want to set the cache size appropriately as well, in memcached's cache, see how many evictions you have. – William Turrell Mar 06 '18 at 08:43

1 Answers1

4

It depends on the application. At the extremes:

  • If every memory access is to the same location, or strided and fits within the cache level of interest (say 256KB total size for a typical L2 cache) without any evictions due to associativity conflicts, the app can approach a 100% hit rate.
  • If memory accesses happen in a region much larger than the cache and are truly random, you could probably end up well under 50% hit rate (I'm not sure of an analytic way to arrive at an exact number but I would guess it would depend on the probability distribution of hitting a given line).
  • You could intentionally construct a pathological case where your app alternates memory accesses to two different memory locations that happen to collide on the same cache line with whatever way your processor happens to handle associativity. In this case the hit rate would approach 0%.

I doubt there's any work on an analytic model to predict what kinds of values you might see for a more realistic workload, but there have definitely been some profiles run on common benchmarks. For example: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.152.3943&rep=rep1&type=pdf. These folks show a rate of between 20 and 50 misses per thousand instructions (MPKI) on the mcf workload from SPECcpu2000. Heres's a description of that workload: https://www.spec.org/cpu2000/CINT2000/181.mcf/docs/181.mcf.html. It may or may not look to the memory subsystem like what you're interested in optimizing.

Back to the point of why you might be asking the question in the first place: if other profiling data shows that you're more bound on cache or memory accesses than arithmetic, locking, etc., then you might pick some heuristic value where if you're under, say, an 80 or 95% hit rate, then it might be worth trying to optimize cache access.

Aaron Altman
  • 1,705
  • 1
  • 14
  • 22