What are the signs of non-data cache misses (instruction, TLB, etc.)?

Question

When you're debugging performance-critical code and looking at the disassembly, it's not too hard to spot bottlenecks due to data cache misses:

Load/store instructions tend to be the usual bottlenecks, which means that if you stop the program, chances are that it will stop close to a load/store instruction loading from some unpredictable memory address.
Similarly, one way to find branch mispredictions is to look and see if breaking the program usually stops it nearby particular jumps, and to then look at the code to see if the jumps are predictable.

(Or, at least, that's how I try to find such bottlenecks. If I'm looking for the wrong symptoms let me know..)

What, however, are the symptoms of other kinds of cache misses?
I do know they're rare, but I still want to know how to spot them if/when they come up.

By "other" caches, I mean things like:

Instruction Cache(s)
Translation Lookaside Buffer
Bonus points for other important caches I should know about but I'm not aware of

The signs are the numbers you see back in the profiler output. Supported by any decent profiler that can read back the performance counters implemented by the processor. Please avoid list questions, just google "intel processor performance counters". — Hans Passant, Nov 26 '13 at 11:11
@HansPassant: Good point, but is there a way to do this when I don't have a profiler handy for the current language? The symptoms I mentioned for a data cache earlier don't require a profiler -- they just require pausing the program randomly a few times. They don't always need a disassembler either -- it's not hard to see possible bottlenecks due to pointer loads or difficult-to-predict branches just by looking at the current line in the source code, in any language. Can we do the same for other types of bottlenecks? — user541686, Nov 26 '13 at 11:19
These counters don't have anything to do with a language, they strictly observe machine code execution. Which of course is universal. There's completely no point in trying to glean profile info from "pausing the program a few times", that's just a waste of time. Use the proper tools. — Hans Passant, Nov 26 '13 at 11:28
@HansPassant: What I meant regarding the language is that not every languages's IDE has a profiler handy, and not every development environment has an IDE handy. Sure, if I'm on a local machine with VS Ultimate installed, I'll use the performance counters. When I'm on a different machine with something more mediocre, though -- pausing the program is easy, but getting a profiler up and running isn't. What I'm describing is a legitimate bottleneck-finding technique (manual Sampling, basically), it's not something I made up out of the blue. The question is how to do it for instruction caches. — user541686, Nov 26 '13 at 11:35
@HansPassant: And for the record, here's just one example to show you "pausing the program a few times" is a perfectly legitimate technique: http://stackoverflow.com/a/18217639 Not everyone has a profiler handy as often as a generic debugger, and not everything is as easy to see in a profiler as it is to see with just manual sampling. — user541686, Nov 26 '13 at 11:38

score 1 · Accepted Answer · edited Nov 26 '13 at 21:20

Ah, the good old poor-mans'-profiler technique. I'd be lying if I said I haven't used it from time to time, but it's indeed very problematic and will probably be biased toward finding heisenbugs and not necessarily reflect the real behavior. Another issue is that instructions are overlapped on modern out-of-order CPUs, so even if the program takes longer to do some load or store, your actual breaking point might fall far away from it (long before the long-latency load instruction actually commits, or long after a store instruction does.

Having that said, if you insist on using it, you can

check for page offset in load/store addresses in the vicinity of the breaking point (4k/2M/.. depending on your system configuration). A small offset within a stream of accesses might indicate a TLB miss and a pagewalk
use LBRs to check last branches behavior and predictability

Can't think of a way to recognize an I-Cache miss, as these are even earlier and further decoupled from the execution pipelines where your debugger is likely to catch the "current" instruction

Ahh #1 is a good point. #2 might be a bit tough without specialized tools though. +1 — user541686, Nov 26 '13 at 21:22

What are the signs of non-data cache misses (instruction, TLB, etc.)?

1 Answers1