How to monitor CPU cache on Windows?

Question

I want to keep my code from flushing off the L2 cache as much as possible.

How would you achieve that in C++ / C# and how would you make it accountable.

EDIT : can I collect number of L2 cache misses alternatively? Answer : Yes Can I get the L2 cache miss count of each process on the windows platform?

Not directly, no. VTune can tell you something about things like how much data is getting moved between caches, but not anything (at least to my recollection) about exactly what data is in a particular level of cache. You're pretty much stuck figuring that out manually based on the addresses where data is stored, and when/how you use it. — Jerry Coffin, Jan 17 '13 at 15:05
There are some MSRs that let you read out cache usage, but it's probably still not as cool as you want. The problem is that it is kinda like Schrodinger's cat. Have a look at this: http://stackoverflow.com/questions/10122520/profiling-cpu-cache-memory-from-the-os-application — thang, Jan 17 '13 at 15:07

score 4 · Accepted Answer · edited Oct 05 '18 at 11:38

It seems that people are reluctant to give away information in this area (c++ or c# doesnt matter). so I would have probably to create my own strategy which will be probably using set of approaches rather than a rigid set of rules or recipes.

To achieve max hit/miss ratio for a Windows application I would probably:

dedicate a physical CPU for the application making sure no other processes are using it, disable hyperthreading, work with BIOS as server manufacturers often provide specific settings for low-latency applications
make sure the piece of code in question has size smaller than the L2 cache on that CPU core
make sure the code does not have something what makes it explicitely flushed off the cache (eg memory barriers in C#)
monitor L2 miss count: http://stackoverflow.com/questions/5141350/can-i-get-the-l2-cache-miss-count-of-each-process-on-the-windows-platform

score 0 · Answer 2 · answered Jan 17 '13 at 15:09

0

The question seems to be based on misunderstandings of how caching works on modern x86 CPUs. For example, it says "L2 instead of L3". On almost all modern x86 CPUs (both Intel and AMD), it's impossible to have data in the L2 cache instead of the L3 cache. That would cause cache coherency to fail because the absence of data in the L3 cache signals its absence in the L2 caches and permits a core to assume no other core has cached it.

Bluntly, people who don't know much about how CPUs work shouldn't be trying to control what data is in their caches. It would be miraculous if they made things better than the folks who designed those CPUs in the first place.

answered Jan 17 '13 at 15:09

David Schwartz

179,497
17
214
278

5

thanks for the witty answer. but it is useles. oh wait a second but you know that.. let me edit the question may be you would give a real answer.. you know. just for fun – Boppity Bop Jan 17 '13 at 15:15
Actually, I think getting you to ask the right question did you more good than any "real answer" to your original question could have. – David Schwartz Jan 17 '13 at 16:32
2

sure thing. its how you pass the time :))) thanks David, or is it Jesus? :)))))) – Boppity Bop Jan 17 '13 at 17:31
1

@David Schwartz What you said about having data in L2 but not in L3 causing cache coherency to fail is not valid for AMD processors. Instead of having the same data in all cache levels, AMD allow different cache levels to have different data. This is called exclusive cache, while having the same data in every level is called inclusive cache. – Vinícius Gobbo A. de Oliveira Jan 22 '13 at 13:20
That's why I say "*almost* all modern x86 CPUs". Today, modern AMD x86 CPUs [also have inclusive cache](http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/4). – David Schwartz Jan 22 '13 at 13:22
I've done this research earlier on 2011. This is new for me. Thank you for pointing it out! – Vinícius Gobbo A. de Oliveira Jan 22 '13 at 13:24
Historically, AMD CPUs were desperately short on cache and AMD used an exclusive design to effectively get slightly larger caches. This is not the case anymore and the performance benefit of an inclusive cache (not having to wait for a core's L2 cache to make sure it doesn't hold an entry it never touched) outweighs the slightly larger effective cache sizes. AMD would have moved sooner, it just was such a big design change. – David Schwartz Jan 22 '13 at 13:26
2

I think it's fairly obvious that OP meant "instead of **just** L3" and did not misunderstand how caches work. This answer seems no longer relevant to (the current state of) the question. Also, even though the second paragraph is true, it should not prevent someone from trying to write efficient code. – RoG Nov 02 '17 at 08:16

How to monitor CPU cache on Windows?

2 Answers2