0

Is there a time when processor use ram directly for its operations, without the involvement of cache memory? OR its like Processor always takes data from Cache and cache gets from Ram?

2 Answers2

2

Not normally, no, unless software purposely bypasses or disables cache on modern CPUs.

With latency to DRAM being maybe 70 ns, that's 280 cycles on a 4GHz CPU. That's enough time for a Skylake CPU to execute ~1100 instructions at 4 instructions per cycle. But its limit on memory-parallelism is about 12 outstanding cache misses. So cache is very very important for performance, even with out-of-order execution.

Fun fact, though: Yes, the MMU in P5 Pentium CPUs and earlier bypassed cache when accessing the page tables after a TLB miss. Source: an answer from Andy Glew, former Intel CPU architect who worked on P6: Are page table walks cached?

Modern CPUs including modern x86 do access page tables through their data caches, though: What happens after a L2 TLB miss?


x86 has movnt instructions for cache-bypassing stores, to avoid cache pollution for big memset. There are tradeoffs for bandwidth. See Enhanced REP MOVSB for memcpy for more about NT stores and no-RFO stores from rep movsb on CPUs with the ERMSB feature. Probably some other architectures have similar features.


You can also set a range of physical address space to be uncacheable. (Or on x86, per 4k virtual page with Page Attribute Table settings in the page table entries.)

Normally this is done for MMIO regions (memory-mapped I/O), where instead of DRAM the "memory" is actually I/O registers on devices like network cards. So every load/store is a visible side-effect, and speculative prefetch must be disallowed. (And every store must result in a separate off-core write transaction, e.g. a PCIe message.)


Also, x86 CPUs have control registers that let you disable cache, making them extremely slow. How can the L1, L2, L3 CPU caches be turned off on modern x86/amd64 chips?. Again, I assume other ISAs have similar features.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
0

CPU's processing speed is way higher than the speed of RAM that's why we use cache which can get data in 1 cycle. If there is a cache miss the data is taken from the ram and moved to the cache and the process is carried out. Lets say the processor works directly with the ram. For example RAM takes 10 cycles to get a chunk of data and CPU stalls for the remaining 9 cycles. If it is moved to cache the data is available as a whole without stalling. That's why it doesn't use RAM.

RAM is also faster the only thing is that it takes more time to find the data. If it finds the data remaining is little faster.

Cs Harish
  • 34
  • 2
  • RAM is faster than what?? L1d cache in a single core is faster than DRAM, at 32, 64, or 128 bytes per cycle on modern x86 (at CPU core clock speed), thus up to 512GB/s on Skylake-X. DDR4 DRAM, even during a burst transfer, runs at up to 3200 MT/s (megatransfers per second, of 8-byte chunks), per DIMM. So that's 3200 * 8 = 25.6 GB/s. Even with multiple DRAM channels, this is still vastly slower than L1d cache bandwidth, and a modern CPU has multiple cores each with their own L1d cache. – Peter Cordes Feb 28 '19 at 08:19
  • Or maybe you mean something about latency? Cache SRAM is inherently lower latency than DRAM, which is why SRAM is used for fast caches. [CAS Latency and static RAM (SRAM)](//electronics.stackexchange.com/q/350316). Even if DRAM was connected to the CPU as directly as cache was, it would be slower. – Peter Cordes Feb 28 '19 at 08:23