1

Do you know any profiler tool that tells you the number of total CPU operations a C/C++ program does? I need something like valgrind callgrind on linux...

Cristi
  • 1,195
  • 6
  • 17
  • 24
  • You can view the assembly that the compiler generates (e.g. `-S` in GCC); inferring the number of CPU instructions from assembly instructions is tricky, though. – Kerrek SB Feb 29 '12 at 22:42
  • @KerrekSB Yeah, a lot of people seem to assume that one instruction = one cycle, which isn't even a little bit true. – Crashworks Feb 29 '12 at 22:52
  • @Crashworks: Indeed - but short of by reading the CPU manual, I don't know any reliable method to count cycles... – Kerrek SB Feb 29 '12 at 22:58
  • @KerrekSB Modern CPUs have hardware registers built in that count cycles, instructions retired, cache misses, etc for profiling purposes. So you can measure them in vivo. But that's not the same as looking and working out its perf ahead of time. With an out-of-order processor it's almost impossible to determine exactly how many cycles some code will take by reading the instruction stream. That's one reason why I prefer working on in-order chips -- there may be pipe bubbles on data hazards, but at least I know exactly where and exactly how long! – Crashworks Feb 29 '12 at 23:39

1 Answers1

2

Intel has some tools such as VTune. They also provide a performance counter library which you can use to instrument your code manually, by reading the hardware perf counter registers before and after a piece of code.

Visual Studio has an instrumented profiler but I don't know if it gets down to the "instructions retired" level of detail.

You should ask yourself what information you really want: do you want to count the number of cycles spent on a function, or do you really want to know how much wall-clock time your app is spending on each function generally? The latter is more useful in most cases, and you can get it more easily by sampling. (see also Mike Dunlavey's simple do-it-by-hand method which works for big hotspots.)

Counting actual instructions retired and branch mispredicts and so on is only useful if you really understand the details of the CPU pipeline and how to optimize around it. Microseconds-per-function is typically what you really want to optimize instead.

Community
  • 1
  • 1
Crashworks
  • 40,496
  • 12
  • 101
  • 170