9

I wish to write software which could essentially profile the CPU cache (L2,L3, possibly L1) and the memory, to analyze performance.

Am I right in thinking this is un-doable because there is no access for the software to the cache content?

Another way of wording my Q: is there any way to know, from the OS/Application level, what data has been loaded into cache/memory?

EDIT: Operating System Windows or Linux and CPU Intel Desktop/Xeon

Ken Thomases
  • 88,520
  • 7
  • 116
  • 154
intrigued_66
  • 16,082
  • 51
  • 118
  • 189

3 Answers3

9

You might want to look at Intel's PMU i.e. Performance Monitoring Unit. Some processors have one. It is a bunch of special purpose registers (Intel calls them Model Specific Registers, or MSRs) which you can program to count events, like cache misses, using the RDMSR and WRMSR instructions.

Here is a document about Performance Analysis on i7 and Xeon 5500.

You might want to check out Intel's Performance Counter Monitor, which is basically some routines that abstract the PMU, which you can use in a C++ application to measure several performance metrics live, including cache misses. It also has some GUI/Commandline tools for standalone use.

Apparently, the Linux kernel has a facility for manipulating MSRs.

There are other utilities/APIs that also use the PMU: perf, PAPI.

ArjunShankar
  • 23,020
  • 5
  • 61
  • 83
5

Cache performance is generally measured in terms of hit rate and miss rate.

There are many tools to do this for you. Check how Valgrind does cache profiling.

Also cache performance is generally measured on a per program basis. Well written programs will result in a fewer cache misses and better cache performance and vice versa for poorly written code.

Measuring the actual cache speed is the headache of the hardware manufacturers and you can refer their manuals to know this value.

Callgrind/Cachegrind combination can help you track cache hits/misses

Pavan Manjunath
  • 27,404
  • 12
  • 99
  • 125
  • 2
    "Well written programs".... which is why I would like to know whether it is possible to monitor which variables are getting loaded into the cache? – intrigued_66 Apr 12 '12 at 12:14
1

This has some examples. TAU, an open-source profiler which works using PAPI can also be used.

If however, you want to write a code to measure the cache statistics you can write a program using PAPI. PAPI allows the user to access the hardware counters without any need to know system architecture. PMU uses Model Specific Registers, hence you must have the knwoledge of the registers to be used.

Perf allows for measurement of L1 and LLC (which is L2), Cachegrind, on the other hand allows the user to measure L1 and LLC (which can be L2 or L3, whichever the highest level cache is). Use Cachegrind only if you have no need of faster results because Cachegrind runs the program about 10X slower.

Community
  • 1
  • 1
sol
  • 95
  • 1
  • 12