8

My understanding is that by default gprof takes into account CPU time. Is there a way to get it to profile based on wall-clock time?

My program does a lot of disk i/o, so the CPU time it uses only represents a fraction of the actual execution time. I need to know which portions of the disk i/o take up the most time.

jetwolf
  • 81
  • 1
  • 2

5 Answers5

3

You can measure wall-clock time by using profiler from google-perftools. To switch google profiler to wall-clock mode, set the environment variable CPUPROFILE_REALTIME=1.

Victor Sergienko
  • 13,115
  • 3
  • 57
  • 91
dronnix
  • 39
  • 2
  • 1
    This should be used with caution because it [disables sleep](https://github.com/gperftools/gperftools/issues/271) – Mike Vella Aug 02 '18 at 13:13
1

gprof won't do this. Look at this.

And this.

In a nutshell: Under gdb, get it running and do Ctrl-Break or Ctrl-C 10 times at random, and display the call stack. If your I/O is taking (for example) 60% of the time, then on (roughly) 6 out of 10 pauses, you will see it in the writebuf or readbuf routine, and the lines of code requesting that I/O will be clearly displayed on the stack.

You could also use lsstack to get the same information.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Hmm... wouldn't this method be very statistically inaccurate? Is there an automated way of doing this, which takes much more than 10 samples, say 1000 samples, but at uniform intervals, and then reports which functions were encountered most often? – jetwolf May 11 '10 at 15:38
  • @jetwolf: Zoom is an example of a profiler that does it with 10^3 samples, but check the first link, especially items 5, 2, 7, and 9. – Mike Dunlavey May 11 '10 at 16:36
  • @jetwolf: Example: suppose I/O is exactly 60%. Standard deviation of number of samples to show it is sqrt(NF(1-F)). For 10 samples that is +/- 1.55, for 1000, it is 15.5. So in 10 samples you will see it roughly 4.45 - 7.55 times. In 1000 samples you will see it roughly 584.5 - 615.5 times. Either way, you'll see exactly what's causing it so if it's fixable, you can fix it. – Mike Dunlavey May 11 '10 at 17:00
  • @jetwolf: Another way to put it. Suppose I/O is exactly 60%. 10 samples would measure 60%, give or take 15%. For 1000 samples, it would be 1.5%. For 100,000 samples it would be 0.15%. So they all measure it, but they only show you the cause of the problem if they summarize at the level of lines on the call stack, and profilers that take lots of samples tend to throw that information away, so the extra measurement precision comes at the expense of finding the problem. – Mike Dunlavey May 18 '10 at 18:09
1

You can use strace or cachegrind to profile the code properly. strace will give you details of time spent in system calls and cachegrind will give detailed analysis of resource utilization.

Rohit
  • 525
  • 3
  • 8
0

You can do this using the -finstrument-functions option with the gcc compiler. That will get a custom function called at entry/exit point of any function, just need to provide a couple of function callbacks (__cyg_profile_func_enter and __cyg_profile_func_exit).

You can find more details by looking up the -finstrument-functions option in the gcc manual.

There is also a good post on Balau's technical blog that provides an end to end overview/example of how this works: Trace and profile function calls with GCC

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
mavimassi
  • 31
  • 4
0

It is very easy to change gprof to do wall-clock profiling. The only 8 chars to replace are:

ITIMER_PROF -> ITIMER_REAL

SIGPROF -> SIGALRM

in the file glibc/sysdeps/posix/profil.c, function __profil, near the calls to setitimer and sigaction (more exact __Setitimer and __sigaction)

After the change any program which uses SIGALRM will be broken and any program which have no blocking-syscall restarting code can give wrong results.

Also, you can directly change int values in glibc binary (please, dont do this on system wide libc.so, make a separate copy and give it to the program with LD_LIBRARY_PATH)

For binary patch, ITIMER_PROF is 2; ITIMER_REAL is 0; SIGPROF is 27 (0x1b); SIGALRM is 14 (0x0e). There are two places for each constant in function profil of glibc.

Another way is to write a ptrace-debugger, which will change arguments of setitimer and sigaction functions at run-time.

osgx
  • 90,338
  • 53
  • 357
  • 513
  • 1
    Unfortunatly, after changing a libc binary, this approach failed. The timer and signal are changed, but.. The `profil()` (which is used by gmon, activated by `-pg`) can't profile dynamic libraries (where most blocking functions reside). Also, the `eip`, seen from signal handler when blocking syscall is active, is wrong for dynamic libs (points into glibc, but not in the syscall wrapper) and is NULL for static link. – osgx May 12 '11 at 16:03