2

I need to profile a couple of C codes and get an annotated file with percentage of exeuction time taken by each line or atleast each block(while/if-else/for-functions) etc.

So far I have looked into valgrind(callgrind) and gperf and some other tools. So far what I get is

  • Count of each function or line of source Code like how many times it is execution.
  • Or Percentage of Count.
  • Or execution time taken by each funtion call.

What I do need however if percentage of execution time not the count and that should be for each line of source code or atleast all blocks(while/if-else/for-functions).

Can someone let me know of a way I can do it ?

Thanks,

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
jav321
  • 151
  • 2
  • 11

2 Answers2

1

I believe perf(1) (part of the linux-tools-common package in Ubuntu) will get you what you want. It makes use of a kernel-based subsystem called Performance counters for Linux, included in newer kernels. More information can be found here.

Simple usage example below. Make sure to compile with debugging symbols.

$ perf record ./myprogram arg1 arg2
$ perf report

Cachegrind might be worth looking into too.

Ulfalizer
  • 4,664
  • 1
  • 21
  • 30
  • It seems `perf report` shows percentages per assembly instruction even, along with the source lines corresponding to those instructions. Not sure if there's a way to get percentages just per source line. – Ulfalizer Mar 04 '15 at 11:29
  • iIf there is a way to get execution percentage for loops, that will do my job. Because I want to detect hotspot loops in a code which take above a certain amount of execution percentage. – jav321 Mar 04 '15 at 14:04
  • @jav321 It will get you the execution percentage of each instruction within loops (i.e., the amount of time spent at particular instructions within the loop in total). For short loops it should be simple to just add up all the instructions. – Ulfalizer Mar 04 '15 at 14:33
0

You need something that samples the stack, either on CPU time, or wall-clock time if you want to include I/O. You don't need particularly high frequency of sampling.

Each stack sample is a list of code locations, some of which can be traced to lines in the code.

The inclusive percentage cost of a line of code is just the number of stack samples that contain it, divided by the total number of stack samples (multiplied by 100).

The usefulness of this is it tells you what percentage of total time you could save if you could get rid of that line.

The exclusive percentage is the fraction of samples where that line appears at the end.

If a line of code appears more than once in a stack sample, that's still just one sample containing it. (That takes care of recursion.)

What tools can do this? Maybe oprofile. Certainly Zoom.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • I believe both `oprofile` and `Zoom` use the *Performance counters for Linux* kernel subsystem too if available. – Ulfalizer Mar 05 '15 at 10:45
  • [Allinea MAP](http://www.allinea.com/products/map) does line level time display (as percentage) - if a loop is a hotspot it will show it - and there is in depth graphical detail to help fix the issues. Gprof also has a line by line mode if you need free. – David Apr 11 '15 at 06:49
  • @David: It's hard to tell from Allinea's web site. Are they reporting a line's inclusive (not just exclusive) percent (not just execution count or absolute time). Does the time include blockage, like I/O, sleeps, etc.(or is it only "CPU" time)? What's more, just line-level percent is not as useful as seeing the entire context that looking at a stack sample gives you. The bad assumption made in all profilers is that you need to know *where* time is spent, not *why* - resulting in missing big speedups. And if you want to get me started on *gprof*, I want to be charitable, but it's hard. – Mike Dunlavey Apr 11 '15 at 13:29
  • @David: [*Here are some points*](http://archive.today/9r927) regarding *gprof* and how the concepts in it have been infecting profilers ever since. Some have evolved to be almost free of those, like Zoom (from Rotate Right) and maybe Allinea (I can't tell), but they have a way to go,.The hard part of that is to realize that the spiffy summarizing UI actually *hides* massive speedups, because they don't capture the *reasons why* time is spent, and they can't intelligently aggregate different lines that have common reasons. – Mike Dunlavey Apr 11 '15 at 13:40
  • @MikeDunlavey - MAP's time sampling is based on wall clock: we measure I/O and other causes of non-compute delay - like communication amongst other things (as it's used a lot for MPI applications on multiple nodes). Think of it as measuring what the cores are doing over real time. We use this for threads too - our users care about actual wall time as they typically do OpenMP and want all their cores to be busy, if any thread is sleeping, that's time they are wasting (as they use 1 core per thread typically) and a Bad Thing... – David Apr 12 '15 at 14:36
  • @David: That's good it's on wall-clock time, and I assume it takes stack samples, and that the percent shown on a line is the percent of stack samples containing that line. It would be even better if the user could browse individual samples. If you want, drop me an email, and I'll send you a 20-slide .ppt file I'm hoping to present at usenix that walks through an example. – Mike Dunlavey Apr 13 '15 at 12:31