Cachegrind simulates how your program interacts with a machine's cache hierarchy and (optionally) branch predictor. It simulates a machine with independent first-level instruction and data caches (I1 and D1), backed by a unified second-level cache (L2).
Questions tagged [cachegrind]
32 questions
17
votes
1 answer
How do you interpret cachegrind output for caching misses?
Out of curiosity I ran coded up several different versions of matrix Multiplication and ran cachegrind against it. In my results below, I was wondering which parts were L1,L2,L3 misses and references and what it all really means? Below is my code…

Kevin Melkowski
- 463
- 1
- 5
- 17
16
votes
1 answer
How to write instruction cache friendly program in c++?
Recently Herb Sutter gave a great talk on "Modern C++: What You Need to Know". The main theme of this talk was efficiency and how data locality and accessing the memory matters.
He has also explained how linear access of memory(array/vector) would…

Mantosh Kumar
- 5,659
- 3
- 24
- 48
14
votes
2 answers
Wincachegrind gives an error
When I tried to use wincachegrind and get the cachegrind file, it returns
Cannot find call target.
cachegrind.out line number:68
Anybody knows how to solve this?
UPDATE, here is the screen shot of the error:
Click this link

Da Heel
- 141
- 1
- 5
14
votes
2 answers
Different read and write count using cachegrind and callgrind
I am doing some experiments with Cachegrind, Callgrind and Gem5. I noticed that a number of accesses were counted as read for cachegrind, as write for callgrind and for both read and write by gem5.
Let's take a very simple example:
int main() {
…

Maxime Chéramy
- 17,761
- 8
- 54
- 75
11
votes
3 answers
How can I pinpoint if the slowness in my program is a CPU cache issue (on Linux)?
I'm currently trying to understand some very very strange behavior in one of my C programs. Apparently, adding or removing a seemingly inconsequential line at the end of it drastically affects the performance in the rest of the program.
My program…

hugomg
- 68,213
- 24
- 160
- 246
9
votes
2 answers
Cache friendly method to multiply two matrices
I intend to multiply 2 matrices using the cache-friendly method ( that would lead to less number of misses)
I found out that this can be done with a cache friendly transpose function.
But I am not able to find this algorithm. Can I know how to…

Aakash Anuj
- 3,773
- 7
- 35
- 47
8
votes
1 answer
Valgrind vs. Linux perf correlation
Suppose that I choose perf events instructions, LLC-load-misses, LLC-store-misses. Suppose further that I test a program prog varying its input. Is valgrind supposed to give me the "same" functional results for the same input and the same counter?…

Dervin Thunk
- 19,515
- 28
- 127
- 217
7
votes
1 answer
Why isn't cachegrind completely deterministic?
Inspired by SQLite, I'm looking at using valgrind's "cachegrind" tool to do reproducible performance benchmarking. The numbers it outputs are much more stable than any other method of timing I've found, but they're still not deterministic. As an…

Sophie Alpert
- 139,698
- 36
- 220
- 238
5
votes
0 answers
Is valgrind's cachegrind still the go-to tool in 2021?
I'm a long-time user of cachegrind for program profiling, and recently went back to check the official documentation once more: https://valgrind.org/docs/manual/cg-manual.html
In it, there are multiple references to CPU models, implementation…

leosh
- 878
- 9
- 22
5
votes
3 answers
Cachegrind: Why so many cache misses?
I'm currently learning about various profiling and performance utilities under Linux, notably valgrind/cachegrind.
I have following toy program:
#include
#include
int
main() {
const unsigned int COUNT = 1000000;
…

Andrej Kesely
- 168,389
- 15
- 48
- 91
4
votes
1 answer
How to limit cachegrind files created by xdebug-profiler
Is there any way to limit cachegrind files (xdebug profiling output)?
I would like to enable xdebug.profile for debugging whole project (not only trigger), but if someone forget to disable it, I don't want disc to be full.
I didn't find any option…

yulka
- 61
- 6
4
votes
1 answer
I don't understand cache miss count between cachegrind vs. perf tool
I am studying about cache effect using a simple micro-benchmark.
I think that if N is bigger than cache size, then cache have a miss operation every first reading cache line.
In my machine, cache line size=64Byte, so I think totally cache occur N/8…

libertyjin
- 41
- 3
4
votes
2 answers
Different cache miss count for a same program between multiple runs
I am using Cachegrind to retrieve the number of cache misses of a static program compiled without libc (just a _start that calls my main function and an exit syscall in asm). The program is fully deterministic, the instructions and the memory…

Maxime Chéramy
- 17,761
- 8
- 54
- 75
4
votes
3 answers
What is the price of a cache miss
I'm analyzing some code and using cachegrind to get the number of cachemisses(L2 and L3) in the execution.
My question is how do I determine the time spend waiting for the cache to get readdy based on the cache misses?
I would like to be able to say…

Martin Kristiansen
- 9,875
- 10
- 51
- 83
3
votes
0 answers
Unique matrix transpose problem: contradictory reports from cachegrind and perf
In the following question, we're talking about an algorithm which transposes a matrix of complex values struct complex {double real = 0.0; double imag = 0.0;};. Owing to a special data-layout, there is a stride-n*n access between the rows, which…

Nitin Malapally
- 534
- 2
- 10