Why are L1 Cache Misses a lot less than L2 Cache Accesses?

Question

I'm using perf_event_open to read L1 / L2 / L3 cache information using the raw performance events defined here. I am seeing interesting behavior:

L1D misses < L2D accesses
L2D misses < L3D accesses
L1D misses < L2D misses
L2D misses < L3D misses

Based on my caching knowledge, I would expect the exact opposite: L1 misses > L2 accesses. There is also a L1I Accesses/Misses, though L1D Miss + L1I Miss < L2D Access. I assume that L1D Replacements could account for this difference? Though that would no explain why L2D Cache Misses < L3D Cache accesses. Is there some hardware caching mechanism that would cause this behavior?

The ultimate goal is to try to deduce DDR Bandwidth from DDR to CPU by capturing the global L3 miss count. Though the first step is to understand the caching behavior of each level of cache.

Every L1d miss results in an L2 access. So do L1i misses. If you're only counting L1d *load* misses, but *all* L2 accesses, then you'll see more. See also [Cache Misses L1 < L2 < L3](https://stackoverflow.com/a/60219444) re: `L1-dcache-misses` only counting loads. (And IIRC, maybe only demand loads, not hardware prefetch. HW prefetch could certainly explain extra L3 accesses.) Also note that only L1 is split into L1i/L1d; L2 and L3 are unified, there is no L2D. — Peter Cordes, Aug 14 '23 at 19:56
Re: measuring bandwidth: [How to calculate the L3 cache bandwidth by using the performance counters linux?](https://stackoverflow.com/q/72597540) and also the `DRAM_BW_Use` metric group. — Peter Cordes, Aug 14 '23 at 19:58

Why are L1 Cache Misses a lot less than L2 Cache Accesses?

0 Answers0