1

I'm learning more about the theoretical side of CPUs, and I read about how cache can be used to fetch a line/block of memory from RAM into an area closer to the CPU that can be accessed more quickly (I think it takes less clock cycles because the CPU doesn't need to move the entire address of the next word into a register, also it's closer to the CPU physically).
But now I'm not clear on the implementation exactly. The CPU is connected to RAM through a data bus that could be 32 or 64 bits wide in modern machines. But L3 cache can in some cases be as large as 32MB in size, and I am pretty convinced there aren't millions of data lines going from RAM to the CPU's cache. Even the tiny-in-comparison L1 cache of only a few KB will take hundreds or even thousands of clock cycles to fetch from RAM only through that tiny data bus.

So what I'm trying to understand is, how exactly is CPU cache implemented to transfer so much infortmation while still being efficient? Are there any examples of simple (relatively) CPUs from the last decades at which I can look to see and learn how they implemented that part of the architecture?

aradarbel10
  • 435
  • 3
  • 10
  • 2
    *The CPU is connected to RAM through a data register that could be 32 or 64 bits wide in modern machines* - nope, not a register. A DDR4 data bus is 64 bits wide, but the connection between L3 and L2 is often wider, like 32 *bytes* wide (e.g. in Intel Sandybridge-family), and the L2 <-> L1d bus width may be a full line wide (64 bytes). [How can cache be that fast?](https://electronics.stackexchange.com/a/329955) and https://www.realworldtech.com/sandy-bridge/2/. – Peter Cordes Apr 02 '21 at 18:56
  • 1
    Remember, cache works in units of *lines*, usually 64 bytes, so yes it takes thousands of cycles to fill L1d from DRAM, but the CPU core can be working on that data as it arrives. (Including storing results of computations, with [a store buffer decoupling cache-miss stores from the pipeline](https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram)) – Peter Cordes Apr 02 '21 at 18:58
  • Also; [What Every Programmer Should Know About Memory?](https://stackoverflow.com/q/8126311) – Peter Cordes Apr 02 '21 at 18:58
  • @PeterCordes so it does indeed take a lot of time for the cache to fill up? Is that the reason cache misses are so expensive? – aradarbel10 Apr 02 '21 at 19:24
  • 1
    Yes. Cache misses are usually expensive because of latency reasons, moreso than bandwidth. L3 misses take so long that even [large out-of-order execution buffers (ROB) can't fully hide them](https://blog.stuffedcow.net/2013/05/measuring-rob-capacity/). When looping over an array, enough loads can be in flight to get pretty good bandwidth, thanks to hardware prefetch and OoO exec, so you're not having to wait for one cache miss to fully resolve before even starting the next load. Unlike with a linked list where the CPU can't get the next load address until the previous load arrives. – Peter Cordes Apr 02 '21 at 19:30

2 Answers2

0

As it turns out, there actually is a very wide bus to move info between levels of cache. Thanks to Peter for pointing it out to me in the comments and providing useful links for further reading.

aradarbel10
  • 435
  • 3
  • 10
0

Since you want the implementation of the CPU cache and RAM(main memory) here's a helpful simulation link where you can give your size of RAM and cache and see how they work.

https://www3.ntu.edu.sg/home/smitha/ParaCache/Paracache/dmc.html

pushpa
  • 23
  • 6
  • 1
    Modern CPUs don't use direct-mapped cache; that site does include a 4-way set associative simulator (https://www3.ntu.edu.sg/home/smitha/ParaCache/Paracache/sa4.html) which is more like [Which cache mapping technique is used in intel core i7 processor?](https://stackoverflow.com/q/49092541) and other modern microarchitectures (although real L1d caches are often 8-way associative.) – Peter Cordes Mar 27 '22 at 19:32
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/31386482) – shmee Mar 31 '22 at 22:30