1

I am trying to profile my code with Random Pause Method. I essentially run the code under GDB session and look to the call stack at random times using Ctrl+c and backtrace command. It seems to work: the slowest part of the code (a loop) is in the stack on almost all pauses and I can find a pattern of what is in the stack.

Here is the problem. I'm trying to automate the profiling process with the following shell script:

while true
  do
    if [ -n "$(pidof MyCode)" ]; then
      gdb -ex "set pagination 0"  -ex "thread apply all bt" -batch -p $(pidof MyCode) >> Log.txt
      sleep 1
    else
      break
    fi
  done;

When I check the output file Log.txt, I unexpectedly never see the slow part of the code in the stack!

Q: How profiling within a GDB session and profiling by calling GDB from a script can give different results?

Some notes:

  • I tried both methods many times using different number of samples (from 5 to 50)
  • It is a C++ code
  • The slow function is a loop parallelized with OpenMP
  • I can't show the code here. I tried to reproduce this behaviour in a smaller code, but no success

EDIT: I think I have a clue of what is going on here. The fact that the code is multithreaded has something to do with that.

In the script above, if I put the gdb command between kill -SIGSTOP $pid and kill -SIGCONT $pid and set the variable GOMP_CPU_AFFINITY, I get similar results of using a GDB session. My guess is that the script can't execute the gdb command when the code is in the parallel loop because all cores are busy.

Community
  • 1
  • 1
montefuscolo
  • 259
  • 2
  • 13
  • Why don't you use a proper sampling profiler, like linux' `perf`? – Ilya Popov Dec 08 '15 at 01:25
  • 1
    @IlyaPopov short answer: I want to see what is in the stack. Long answer: please check the link on my question and references therein. – montefuscolo Dec 08 '15 at 13:30
  • Maybe always sleeping 1 second isn't random enough. Do you see anything different if you replace `sleep 1` with `sleep 0.$RANDOM` ? – Mark Plotnick Dec 08 '15 at 16:44
  • 1
    @IlyaPopov: When speed is your goal, random pausing, while crude, works better because it finds a superset of the speedups that proper profilers find. Reasons: a) They tend to sample on-cpu or off-cpu, but not on wall-time. b) In analysis of samples, they replace quality with quantity (because they are more interested in measuring than finding speedups). [*This post*](http://stackoverflow.com/a/25870103/23771) shows how easy it is for speedups to hide from profilers. [*This post*](http://programmers.stackexchange.com/a/302345/2429) shows why that's a killer. – Mike Dunlavey Dec 08 '15 at 23:41
  • I've never understood why anybody wants to automate this, because what makes it valuable is the attention you give to each sample. You can see things no sample summarizer can. I'm sure you know the principle behind it: if something will save fraction F of time, the number of samples you need to see it twice, on average, is 2/F. If F is 0.5: 4 samples. If F is 0.2: 10 samples. If F is 0.8: 2.5 samples. Infinite loop: 2 samples. The bigger the potential speedup, the fewer the samples. – Mike Dunlavey Dec 09 '15 at 01:01
  • You are most likely right with your clue about _all cores are busy._ Consider posting that as an answer. – Armali Sep 04 '17 at 07:41

0 Answers0