Questions tagged [cpu-cache]

A CPU-cache is a hardware structure used by the CPU to reduce the average access memory time.

Caching is beneficial once some data elements get re-used.

Caching is a general policy
aimed at eliminating latency,
already paid for next, repetitive re-accessing some already visited
but otherwise "expensive" ( _{read slow} ) resource ( _storage )

Caching does not speed-up memory access.

The maximum a professional programmer can achieve is to pay attention and excercise a due care to allow some latency-masking in a concurrent-mode of code execution, with a carefully issued instructions well beforehand before a forthcoming memory data is indeed consumed, so that the cache-management can release a LRU part and pre-fetch the requested data from slow DRAM.

How it works?

Main memory is usually built with DRAM technology, that allows for big, dense and cheap storage structures. But DRAM access is much slower than the cycle time of a modern CPU (the so called memory wall). A CPU-cache is a smaller memory, usually built with SRAM technology (expensive, but fast) that reduces the amount of accesses to main memory by storing the main memory contents that are likely to be referenced in the near future. Caches exploit a property of programs: the principle of locality, which means adjacent memory addresses are likely to be referenced close in time (spatial locality), and if an address is referenced once, it is more likely to be referenced again soon (temporal locality).

The CPU cache is tagged with an address which are extra SRAM cells. These tag cells indicate the specific address that holds the data. The CPU cache can never mirror the entire system memory so this address must be stored. The index in the array forms a set. The index and the tag can use either physical or virtual (MMU) addresses; leading to the three types PIPT, VIVT, VIPT.

Modern CPUs contain multiple levels of cache. In SMP situations a CPU cache level may be private to a single CPU, a cluster of CPUs or the whole system. Because caching can result in multiple copies of data being present in an SMP system, cache coherence protocols are used to keep data consistent. The VIVT and VIPT type caches can also result in interactions with the MMU (and its cache commonly called a TLB).

Questions regarding CPU cache inconsistencies, profiling or under-utilization are on-topic.

For more information see Wikipedia's CPU-cache article.

Also: tlb, mmu

1011 questions

875

votes

10 answers

What is a "cache-friendly" code?

What is the difference between "cache unfriendly code" and the "cache friendly" code? How can I make sure I write cache-efficient code?

c++ performance caching memory cpu-cache

asked May 22 '13 at 18:37

Noah Roth

9,060
5
19
25

409

votes

7 answers

Why does the order of the loops affect performance when iterating over a 2D array?

Below are two programs that are almost identical except that I switched the i and j variables around. They both run in different amounts of time. Could someone explain why this happens? Version 1 #include #include main () { …

c performance for-loop optimization cpu-cache

asked Mar 30 '12 at 02:17

Mark

8,408
15
57
81

249

votes

3 answers

How much of ‘What Every Programmer Should Know About Memory’ is still valid?

I am wondering how much of Ulrich Drepper's What Every Programmer Should Know About Memory from 2007 is still valid. Also I could not find a newer version than 1.0 or an errata. (Also in PDF form on Ulrich Drepper's own site:…

optimization memory x86 cpu-architecture cpu-cache

asked Nov 14 '11 at 18:30

Framester

33,341
51
130
192

216

votes

5 answers

Approximate cost to access various caches and main memory?

Can anyone give me the approximate time (in nanoseconds) to access L1, L2 and L3 caches, as well as main memory on Intel i7 processors? While this isn't specifically a programming question, knowing these kinds of speed details is neccessary for some…

performance memory latency cpu-cache low-latency

asked Nov 03 '10 at 13:02

Ted Graham

3,511
4
24
25

179

votes

15 answers

How does one write code that best utilizes the CPU cache to improve performance?

This could sound like a subjective question, but what I am looking for are specific instances, which you could have encountered related to this. How to make code, cache effective/cache friendly (more cache hits, as few cache misses as possible)?…

caching performance cpu-cache

asked Apr 18 '09 at 10:33

goldenmean

18,376
54
154
211

170

votes

5 answers

Write-back vs Write-Through caching?

My understanding is that the main difference between the two methods is that in "write-through" method data is written to the main memory through the cache immediately, while in "write-back" data is written in a "later time". We still need to wait…

caching cpu-architecture cpu-cache

asked Nov 23 '14 at 10:25

triple fault

13,410
8
32
45

votes

3 answers

Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size

C++17 added std::hardware_destructive_interference_size and std::hardware_constructive_interference_size. First, I thought it is just a portable way to get the size of a L1 cache line but that is an oversimplification. Questions: How are these…

c++ multithreading concurrency c++17 cpu-cache

asked Sep 24 '16 at 19:34

Philipp Claßen

41,306
31
146
239

votes

4 answers

Line size of L1 and L2 caches

From a previous question on this forum, I learned that in most of the memory systems, L1 cache is a subset of the L2 cache means any entry removed from L2 is also removed from L1. So now my question is how do I determine a corresponding entry in L1…

caching memory-management cpu-architecture cpu-cache

asked Feb 05 '13 at 12:39

prathmesh.kallurkar

5,468
8
39
50

votes

5 answers

What is a cache hit and a cache miss? Why would context-switching cause cache miss?

From the 11th Chapter(Performance and Scalability) and the section named Context Switching of the JCIP book: When a new thread is switched in, the data it needs is unlikely to be in the local processor cache, so a context-switch causes a flurry…

concurrency language-agnostic cpu cpu-architecture cpu-cache

asked Sep 01 '13 at 14:24

Geek

26,489
43
149
227

votes

4 answers

simplest tool to measure C program cache hit/miss and cpu time in linux?

I'm writing a small program in C, and I want to measure it's performance. I want to see how much time do it run in the processor and how many cache hit+misses has it made. Information about context switches and memory usage would be nice to have…

performance cpu-cache measurement context-switch memcache-stats

asked Apr 10 '12 at 02:47

jperelli

6,988
5
50
85

votes

10 answers

Which ordering of nested loops for iterating over a 2D array is more efficient

Which of the following orderings of nested loops to iterate over a 2D array is more efficient in terms of time (cache performance)? Why? int a[100][100]; for(i=0; i<100; i++) { for(j=0; j<100; j++) { a[i][j] = 10; …

c performance for-loop cpu-cache

asked Mar 27 '12 at 10:51

Sachin Mhetre

4,465
10
43
68

votes

3 answers

Why does the speed of memcpy() drop dramatically every 4KB?

I tested the speed of memcpy() noticing the speed drops dramatically at i*4KB. The result is as follow: the Y-axis is the speed(MB/second) and the X-axis is the size of buffer for memcpy(), increasing from 1KB to 2MB. Subfigure 2 and Subfigure 3…

performance memory malloc memcpy cpu-cache

asked Jan 10 '14 at 08:03

foool

1,462
1
15
29

votes

10 answers

C++ cache aware programming

is there a way in C++ to determine the CPU's cache size? i have an algorithm that processes a lot of data and i'd like to break this data down into chunks such that they fit into the cache. Is this possible? Can you give me any other hints on…

c++ optimization caching cpu-cache

asked Dec 17 '09 at 14:46

Mat

votes

3 answers

How are cache memories shared in multicore Intel CPUs?

I have a few questions regarding Cache memories used in Multicore CPUs or Multiprocessor systems. (Although not directly related to programming, it has many repercussions while one writes software for multicore processors/multiprocessors systems,…

performance x86 multiprocessing intel cpu-cache

asked Jun 03 '09 at 14:06

goldenmean

18,376
54
154
211

votes

3 answers

Are CPU registers and CPU cache different?

cpu-registers cpu-cache

asked Aug 17 '10 at 08:08

Qian

1,462
1
14
25

2 3

…

67 68 Next