8

I need to evaluate the time taken by a C++ function in a bunch of hypothesis about memory hierarchy efficiency (e.g: time taken when we have a cache miss, a cache hit or page fault when reading a portion of an array), so I'd like to have some libraries that let me count the cache miss / page faults in order to be capable of auto-generating a performance summary.

I know there are some tools like cachegrind that gives some related statistics on a given application execution, but I'd like a library, as I've already said.

edit Oh, I forgot: I'm using Linux and I'm not interested in portability, it's an academic thing.

Any suggestion is welcome!

akappa
  • 10,220
  • 3
  • 39
  • 56
  • 1
    Can you instrument the operating system to help give you some of this information? In a modern preemptive multitasking system with virtual memory, it's quite possible that the OS will do all kinds of crazy things to your application without its knowledge... – Carl Norum Mar 27 '11 at 08:01
  • I don't know, I'm using Linux. As for the page faults I know that such statistics are of little interest because of the way modern OSes manages memory, as you said: in fact, I'm much more interested in L2 cache misses, which I think are much more independent on the operating system management. – akappa Mar 27 '11 at 08:05

4 Answers4

5

It looks like now there is exactly what I was searching for: perf_event_open.

It lets you do interesting things like initializing/enabling/disabling some performance counters for subsequently fetching their values through an uniform and intuitive API (it gives you a special file descriptor which hosts a struct containing the previously requested informations).

It is a linux-only solution and the functionalities varies depending on the kernel version, so be careful :)

akappa
  • 10,220
  • 3
  • 39
  • 56
5

Most recent CPUs (both AMD and Intel) have performance monitor registers that can be used for this kind of job. For Intel, they're covered in the programmer's reference manual, volume 3B, chapter 30. For AMD, it's in the BIOS and Kernel Developer's Guide.

Either way, you can count things like cache hits, cache misses, memory requests, data prefetches, etc. They have pretty specific selectors, so you could get a count of (for example) the number of reads on the L2 cache to fill lines in the L1 instruction cache (while still excluding L2 reads to fill lines in the L1 data cache).

There is a Linux kernel module to give access to MSRs (Model-specific registers). Offhand, I don't know whether it gives access to the performance monitor registers, but I'd expect it probably does.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Were you referring to perfmon2 when talking about the kernel module? – akappa Mar 27 '11 at 09:06
  • @akappa: as I recall, it was just called "MSR kernel module" or something like that. It's been a while since I used it though, so it's possible I don't remember the name correctly. – Jerry Coffin Mar 27 '11 at 19:03
  • Okay, I'll do some research if perfmon2 will be too difficult to use. Thanks :) – akappa Mar 27 '11 at 19:41
3

Intel VTune is a performance tuning tool that does exactly what you are asking for; Of course it works with Intel processors, as it access the internal processor counters, as explained by Jerry Coffin, so this probably not work on an AMD processor. It expose literally undreds of counters, like cache hit/misses, branch prediction rates, etc. the real issue with it is understanding which counters to check ;)

sergico
  • 2,595
  • 2
  • 29
  • 40
  • Interesting tool, but I'm reading the documentation and can't find any mention to some APIs to fetch some performance statistics at runtime. – akappa Mar 28 '11 at 16:35
  • Honestly I always used as a stand alone program, if I found more details I'll post them ;) – sergico Mar 28 '11 at 18:45
  • That's what I found: [link](http://software.intel.com/en-us/articles/performance-tools-for-software-developers-using-the-vtune-analyzer-pauseresume-api-from-microsoft-visual-basic/) and [link](http://www.google.com/url?sa=t&source=web&cd=1&ved=0CBoQFjAA&url=http%3A%2F%2Fsoftware.intel.com%2Ffile%2F6743&rct=j&q=Vtune%20API&ei=M9iQTbaDCZGbhQe41-G7Dg&usg=AFQjCNGP4BIHFHtUbNUBY4wcth9_Tmmd8A&cad=rja) – sergico Mar 28 '11 at 18:48
  • @akappa: PLease ignore the previous comment: That's what I found: [vtune API for pause/resume](http://software.intel.com/en-us/articles/performance-tools-for-software-developers-using-the-vtune-analyzer-pauseresume-api-from-microsoft-visual-basic/) and [VTune API for reader/writer](http://www.google.com/url?sa=t&source=web&cd=1&ved=0CBoQFjAA&url=http%3A%2F%2Fsoftware.intel.com%2Ffile%2F6743&rct=j&q=Vtune%20API&ei=M9iQTbaDCZGbhQe41-G7Dg&usg=AFQjCNGP4BIHFHtUbNUBY4wcth9_Tmmd8A&cad=rja) Not sure if this is exactly what you are looking for though. – sergico Mar 28 '11 at 18:54
1

The cache misses cannot be just counted easily. Most tools or profilers simulate the memory access by redirecting memory accesses to a function that provides this feature. That means these kind of tools instrument the code at all places where a memory access is done and makes your code run awfully slowly. This is not what your intent is I guess.

However depending on the hardware you might have some other possibilities. But even if this is the case the OS should support it (because otherwise you would get system global stats not the ones related to a process or thread)

EDIT: I could find this interesting article that may help you: http://lwn.net/Articles/417979/

jdehaan
  • 19,700
  • 6
  • 57
  • 97
  • I was exactly thinking about some "magic" processor features (like some nice registers that somehow counts cache faults), abstracted by a library which auto-detect the processor type and does the necessary plumbing to get the actual data. I don't know if it's even possible that something like that works - those "magic values" should be copied upon context switch, for example - but if such a library exists would be great. – akappa Mar 27 '11 at 08:10