6

Motivation: I cant get google cpu profiler to work on machine where code runs(with my last breath I curse libunwind :)), so I was wondering if the gdb supports high frequency random pausing of the program execution, storing the name of the function where break occured and counting how many times it paused in function x. That is what I would call "run time sampling", there is probably more precise/smarter name. I looked at the oprofile, but it is to complicated to a) figure out if it can do it b) to figure out how to do it EDIT: apparently correct name is: "statistical sampling method"

EDIT2: reason why Im offering a bounty for this is that I see some ppl on SO recommending doing manual break 10-20x and examining stack with bt... Seems very wasteful when it comes to time, so I guestimate some smart ppl automated it. :)
EDIT3: gprof wont cut it... i tried running it recently on ARM system and output was trash... :( I guess its troubles with multithreading is the reason for that...

NoSenseEtAl
  • 28,205
  • 28
  • 128
  • 277
  • The manual sampling seems wasteful if you haven't tried it. See 1st comment [*here*](http://stackoverflow.com/a/893272/23771). Last paragraph of [*this answer*](http://stackoverflow.com/a/4832698/23771). [*This answer.*](http://stackoverflow.com/a/317160/23771) [*This answer.*](http://stackoverflow.com/a/2474118/23771) The comment by [*ErikE here*](http://stackoverflow.com/a/378024/23771). The codelidoo comment [*here*](http://stackoverflow.com/a/3097542/23771). Try it, then deprecate it. – Mike Dunlavey Jan 16 '13 at 18:05

2 Answers2

3

GDB would not do this well, although you could certainly hack something up that gave wildly inaccurate results.

I'd suggest Valgrind's "Callgrind" plugin. As a bonus it requires absolutely no recompilation or other special setup. All you need is valgrind installed in your system, and debug information in your program (or, full symbol information, at least; I'm not sure).

You then invoke your program like this:

valgrind --tool=callgrind <your program command line>

When it's done there will be a file name callgrind.out.<pid> in the current directory. You can read and visualise this file with a very nice GUI tool called kcachegrind (usually you have to install this separately).

The only problem is that, because callgrind slows the execution of your program slightly, the time spent in system calls can appear smaller (in percentage terms) than it really would be. By default, callgrind does not include system time in its counters, so the values it give you are a real comparison of the code in your program, if not the actual time 'under' that function. This can be confusing, at first, so if that happens you try adding --collect-systime=yes.

I'm not sure what the state of callgrind on ARM might be. ARMv7 is listed as a supported platform, but only says "fairly complete", whatever that means.

ams
  • 24,923
  • 4
  • 54
  • 75
  • idk if the valgrind is the option since I didnt write the code and IDK if it depends on some timeouts...aka cancel if x didnt happen within 200 ms from requesting x. :) – NoSenseEtAl Jan 21 '13 at 11:50
  • It runs slower, on x86_64, but still quite a respectable speed. I can't speak for ARM. Give a go, it doesn't take much effort. – ams Jan 21 '13 at 13:19
  • wait, isnt valgrind slowdown 10+x? And I cant just try it because target HW doesnt have valgrind or internet connection. Not even GCC. Just lovely GDB.:) – NoSenseEtAl Jan 21 '13 at 13:51
  • I think callgrind is a bit better than memcheck, but I'm not sure. No Internet is a tricky problem indeed. :) – ams Jan 21 '13 at 13:53
  • Incidentally, a 200ms timeout on an I/O event will still work fine, since that's all running in the kernel, outside valgrind. If you have multiple threads then that's different, and a lot harder to profile anyway. – ams Jan 21 '13 at 13:55
3

You can manually sample in GDB by pausing it at run time.

What you seem to think you want is gprof, but if your goal is to make the program as fast as possible, then I would suggest

  • High frequency of sampling is not helpful.

  • Counting the number of samples where the program counter is in function X is not helpful except in artificially small programs.

If you follow that link, you will see the reasons why, and directions for how to do it successfully.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • 1
    High frequency of sampling without having to compile with gprof flags would be helpful. – rogerdpack Jun 14 '12 at 21:18
  • 1
    @rogerdpack: I don't actually think so, as per *[this link](http://scicomp.stackexchange.com/a/1870/1262)*. Maybe it's a bit of a mind-stretch, but if something takes enough time to be worth fixing, you will see it in 10 samples. The other 990 samples don't tell you any more. They actually tell you less, because they summarize away all the insight that tells you what's going on. – Mike Dunlavey Jun 14 '12 at 21:38
  • Ok I see in your link: http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux that it lists the other tools that may be useful like lsstack, oprofile, etc. Thanks! See also my own answer there, which lists what I could find of sampling profilers: http://stackoverflow.com/a/11143125/32453 – rogerdpack Jun 21 '12 at 17:03