Questions tagged [cpu-cache]

A CPU-cache is a hardware structure used by the CPU to reduce the average access memory time.

Caching is beneficial once some data elements get re-used.


Caching is a general policy
aimed at eliminating latency,
already paid for next, repetitive re-accessing some already visited
but otherwise "expensive" ( read slow ) resource ( storage )


Caching does not speed-up memory access.

The maximum a professional programmer can achieve is to pay attention and excercise a due care to allow some latency-masking in a concurrent-mode of code execution, with a carefully issued instructions well beforehand before a forthcoming memory data is indeed consumed, so that the cache-management can release a LRU part and pre-fetch the requested data from slow DRAM.


How it works?

Main memory is usually built with DRAM technology, that allows for big, dense and cheap storage structures. But DRAM access is much slower than the cycle time of a modern CPU (the so called memory wall). A CPU-cache is a smaller memory, usually built with SRAM technology (expensive, but fast) that reduces the amount of accesses to main memory by storing the main memory contents that are likely to be referenced in the near future. Caches exploit a property of programs: the principle of locality, which means adjacent memory addresses are likely to be referenced close in time (spatial locality), and if an address is referenced once, it is more likely to be referenced again soon (temporal locality). Latency pipeline for memory, disk, network, etc

The CPU cache is tagged with an address which are extra SRAM cells. These tag cells indicate the specific address that holds the data. The CPU cache can never mirror the entire system memory so this address must be stored. The index in the array forms a set. The index and the tag can use either physical or virtual (MMU) addresses; leading to the three types PIPT, VIVT, VIPT.

Modern CPUs contain multiple levels of cache. In SMP situations a CPU cache level may be private to a single CPU, a cluster of CPUs or the whole system. Because caching can result in multiple copies of data being present in an SMP system, cache coherence protocols are used to keep data consistent. The VIVT and VIPT type caches can also result in interactions with the MMU (and its cache commonly called a TLB).

Questions regarding CPU cache inconsistencies, profiling or under-utilization are on-topic.

For more information see Wikipedia's CPU-cache article.

Also: ,

1011 questions
875
votes
10 answers

What is a "cache-friendly" code?

What is the difference between "cache unfriendly code" and the "cache friendly" code? How can I make sure I write cache-efficient code?
Noah Roth
  • 9,060
  • 5
  • 19
  • 25
409
votes
7 answers

Why does the order of the loops affect performance when iterating over a 2D array?

Below are two programs that are almost identical except that I switched the i and j variables around. They both run in different amounts of time. Could someone explain why this happens? Version 1 #include #include main () { …
Mark
  • 8,408
  • 15
  • 57
  • 81
249
votes
3 answers

How much of ‘What Every Programmer Should Know About Memory’ is still valid?

I am wondering how much of Ulrich Drepper's What Every Programmer Should Know About Memory from 2007 is still valid. Also I could not find a newer version than 1.0 or an errata. (Also in PDF form on Ulrich Drepper's own site:…
Framester
  • 33,341
  • 51
  • 130
  • 192
216
votes
5 answers

Approximate cost to access various caches and main memory?

Can anyone give me the approximate time (in nanoseconds) to access L1, L2 and L3 caches, as well as main memory on Intel i7 processors? While this isn't specifically a programming question, knowing these kinds of speed details is neccessary for some…
Ted Graham
  • 3,511
  • 4
  • 24
  • 25
179
votes
15 answers

How does one write code that best utilizes the CPU cache to improve performance?

This could sound like a subjective question, but what I am looking for are specific instances, which you could have encountered related to this. How to make code, cache effective/cache friendly (more cache hits, as few cache misses as possible)?…
goldenmean
  • 18,376
  • 54
  • 154
  • 211
170
votes
5 answers

Write-back vs Write-Through caching?

My understanding is that the main difference between the two methods is that in "write-through" method data is written to the main memory through the cache immediately, while in "write-back" data is written in a "later time". We still need to wait…
triple fault
  • 13,410
  • 8
  • 32
  • 45
96
votes
3 answers

Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size

C++17 added std::hardware_destructive_interference_size and std::hardware_constructive_interference_size. First, I thought it is just a portable way to get the size of a L1 cache line but that is an oversimplification. Questions: How are these…
Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239
95
votes
4 answers

Line size of L1 and L2 caches

From a previous question on this forum, I learned that in most of the memory systems, L1 cache is a subset of the L2 cache means any entry removed from L2 is also removed from L1. So now my question is how do I determine a corresponding entry in L1…
81
votes
5 answers

What is a cache hit and a cache miss? Why would context-switching cause cache miss?

From the 11th Chapter(Performance and Scalability) and the section named Context Switching of the JCIP book: When a new thread is switched in, the data it needs is unlikely to be in the local processor cache, so a context-switch causes a flurry…
Geek
  • 26,489
  • 43
  • 149
  • 227
76
votes
4 answers

simplest tool to measure C program cache hit/miss and cpu time in linux?

I'm writing a small program in C, and I want to measure it's performance. I want to see how much time do it run in the processor and how many cache hit+misses has it made. Information about context switches and memory usage would be nice to have…
jperelli
  • 6,988
  • 5
  • 50
  • 85
72
votes
10 answers

Which ordering of nested loops for iterating over a 2D array is more efficient

Which of the following orderings of nested loops to iterate over a 2D array is more efficient in terms of time (cache performance)? Why? int a[100][100]; for(i=0; i<100; i++) { for(j=0; j<100; j++) { a[i][j] = 10; …
Sachin Mhetre
  • 4,465
  • 10
  • 43
  • 68
66
votes
3 answers

Why does the speed of memcpy() drop dramatically every 4KB?

I tested the speed of memcpy() noticing the speed drops dramatically at i*4KB. The result is as follow: the Y-axis is the speed(MB/second) and the X-axis is the size of buffer for memcpy(), increasing from 1KB to 2MB. Subfigure 2 and Subfigure 3…
foool
  • 1,462
  • 1
  • 15
  • 29
64
votes
10 answers

C++ cache aware programming

is there a way in C++ to determine the CPU's cache size? i have an algorithm that processes a lot of data and i'd like to break this data down into chunks such that they fit into the cache. Is this possible? Can you give me any other hints on…
Mat
  • 841
  • 1
  • 9
  • 5
57
votes
3 answers

How are cache memories shared in multicore Intel CPUs?

I have a few questions regarding Cache memories used in Multicore CPUs or Multiprocessor systems. (Although not directly related to programming, it has many repercussions while one writes software for multicore processors/multiprocessors systems,…
goldenmean
  • 18,376
  • 54
  • 154
  • 211
57
votes
3 answers

Are CPU registers and CPU cache different?

Are CPU registers and CPU cache different?
Qian
  • 1,462
  • 1
  • 14
  • 25
1
2 3
67 68