0

Arm A72 core TRM specifies that L1 has a 'fill buffer' and that L2 has a 'fill/evict queue' and the manual does not mention anywhere what they do. Am I right in assuming the following

  1. Fill buffer is something that temporarily holds a cache line before it is loaded onto the cache (but why?)
  2. Evict queue buffers a cache line when it is evicted from the cache and before it is written back to the memory.
user2927392
  • 249
  • 3
  • 11

1 Answers1

0

Probably a "fill buffer" is something that waits for / tracks the incoming cache line between sending out a request and the data coming back. Loads from that cache line can attach themselves to it, so they get notified when data comes back (so it can get written to registers and, if needed, forwarded to instructions waiting to read those registers).

That's what Line Fill Buffers do in Intel CPUs. (Although in Intel CPUs, LFBs are used for stores as well.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • It is not clear whether loads get "notified" when a cache line returns -- the evidence I have seen suggests that Intel just retries the loads until they can complete. It is likely more complex than that, but this part of the microarchitecture is only sometimes visible -- e.g., in the Sandy Bridge EP, floating-point arithmetic operations combined with loads incremented the FP operation counter every time the load retried, so the overcounting was proportional to the average cache miss latency. – John D McCalpin Aug 17 '22 at 15:27
  • @JohnDMcCalpin: Oh right, yeah Intel CPUs replay uops that have a load result as an input, whether that's another load or an ALU instruction. But the cache-miss load uops themselves don't get replayed, so the data does end up committed to a physical register, and available on the bypass-forwarding bus, without extra work by a load execution unit. It seems load uop itself can leave the RS after dispatching to an execution unit, before the load completes. [Are load ops deallocated from the RS when they dispatch, complete or some other time?](https://stackoverflow.com/q/59905395) – Peter Cordes Aug 17 '22 at 17:24
  • @JohnDMcCalpin: Some earlier SO answers have a misunderstand that BeeOnRope (Travis Downs) and I had initially, based on testing a chain of dependent loads: we were assuming that the cache-miss load uops themselves had to get replayed to pull the data in when it became available. Later testing with dependent ALU uops revealed our mistake. [About the RIDL vulnerabilities and the "replaying" of loads](https://stackoverflow.com/q/56187269) includes my current understanding of replaying dependent uops. – Peter Cordes Aug 17 '22 at 17:33
  • Cortex-A72 is an out-of-order core (https://en.wikipedia.org/wiki/ARM_Cortex-A72); I didn't check before writing this quick answer; I tried to simplify things to the point where it would still be correct for an in-order core. Regardless, this is an ARM question, and its mechanism for executing dependent uops after a cache-miss load completes might be totally different from Intel. A more power-sensitive design might avoid replays. Intel's optimistic dispatch with replay mechanism I think dates back to P6 and/or Netburst, designed before we power (density) became the limiting factor. – Peter Cordes Aug 17 '22 at 18:58