18

I need to estimate the exact starting location of some hotspot in a program, in terms of x86 machine instruction count (so that it can later be run in some emulator/simulator). Is there a way to use gdb to count the number of machine instructions being executed up to a breakpoint?

There are other alternatives of course, I could use a emulation / binary instrumentation tool (like Pin), and track the run while counting instructions, but that would require installing this tool on every platform I work on - not always possible. I need some tool that's available on pretty much any linux machine.

With gdb, I guess it's also possible to run stepi X over large strides as some sort of coarse grained search until we hit the breakpoint, then repeat with reduced the resolution, but that would be excruciatingly slow. Is there another way to do this?

pjs
  • 18,696
  • 4
  • 27
  • 56
Leeor
  • 19,260
  • 5
  • 56
  • 87
  • GDB is completely unsuitable for this purpose. Use something like [PAPI](http://icl.cs.utk.edu/papi/) to accurately measure how your application performs. You should have instrumentation tools everywhere you have an editor too, anyway. – Michael Foukarakis Feb 07 '14 at 12:40
  • @mfukar thanks, but i'm not sure it's easily available everywhere like GDB is. I also wouldn't say GDB is entirely unsuitable, it seems a really simple feature to add, as it already knows how to step at machine inst resolution - all it needs is to keep track of instruction count somewhere. – Leeor Feb 07 '14 at 12:45
  • `ptrace`ing a program in a debugger alters program state which may be vital to performance (cache state, TLB misses, etc). The results you'll get while running a program in a debugger apply only on that situation. – Michael Foukarakis Feb 07 '14 at 12:47
  • Why would you you want to count machine instructions? If this is about profiling that's not a very useful measure. – pentadecagon Feb 07 '14 at 12:48
  • @pentadecagon, like I said - I need to run a certain section in a simulator (gem5 for e.g.), which can be triggered to start at a given instruction count – Leeor Feb 07 '14 at 12:49
  • @mfukar, TLB and cache behavior would affect performance, but not dynamic instruction count (at least not user level instructions, I don't care about external interrupts as they won't be in the simulator as well). I don't want to add any breakpoint/trap that would change the code, and I was hoping GDB would know how to count a single step without any such wrappers skewing the result. – Leeor Feb 07 '14 at 12:56
  • @Leeor "it seems a really simple feature to add, as it already knows how to step at machine inst resolution" -- of course it does: it just uses `ptreace(SINGLE_STEP...)`. *That's* what makes it slow. So, no, it *doesn't* make sense to add this "feature". – Employed Russian Feb 07 '14 at 15:27
  • @EmployedRussian, I didn't say it has to be fast, I just wanted to know if there's any such capability. Like I said, Pin also does something similar when it instruments at the instruction level, and that has acceptable performance. – Leeor Feb 07 '14 at 19:05

3 Answers3

25

Try this:

set pagination off
set $count = 0
while $pc != 0xyourstoppingaddress
  stepi
  set $count++
end
print $count

Then go get a cup of coffee. Or a long lunch.

Mark Plotnick
  • 9,598
  • 1
  • 24
  • 40
  • How do i use it when i want to wait for a segfault or any other signal which exists the program? – 12431234123412341234123 Jan 19 '21 at 16:58
  • 3
    @12431234123412341234123 Try changing `while $pc != 0xyourstoppingaddress` to `while $pc != 0xyourstoppingaddress && $_siginfo.si_signo != 11` to run the loop until SIGSEGV is received. The `stepi` command is going to cause the program to get a SIGTRAP signal (5), so if you want to stop on any signal other than SIGTRAP, try `while $pc != 0xyourstoppingaddress && $_siginfo.si_signo == 5` – Mark Plotnick Jan 19 '21 at 17:35
7

This is actually only a slight improvement of the usability of Mark's solution.

We can define a function do_count:

define do_count
set $count=0
while ($pc != $arg0)
stepi
set $count=$count+1
end
print $count
end

and then this function can be reused for counting the number of steps over and over again:

set pagination off
do_count 0xaddress1
do_count 0xaddress2

One can even put this definition into .gdbinit (on Linux, on Windows it should be called gdb.ini) in the home-folder, so it becomes available automatically after the start of the gdb (use show user to see, whether the function was loaded).

ead
  • 32,758
  • 6
  • 90
  • 153
  • How can we change the condition while ($pc != $arg0) so that we can count how many times a specific instruction has been executed? As you know, we have 2 concepts, static instruction and dynamic one. A static instruction can be executed several times. – husin alhaj ahmade Feb 26 '21 at 18:34
7

If you actually want a cycle count (maybe as an approximation of instruction count with known IPC), and you're running on bare metal ARM, you might be able to read the cycle counter, see for example Cycle counter on ARM Cortex M4 (or M3)?


In your scenario, I would try Process Record and Replay to obtain the elapsed instruction count (available since GDB 7.0 and improved afterwards):

  1. Start measurement: record btrace (or record full if the former is not available).
  2. continue execution (until a breakpoint, or use next or other commands to step through).
  3. Obtain measurement: info record
  4. Clear recorded results: record stop (recommended as the buffer is of limited size).

Example:

(gdb) record btrace
(gdb) frame
#0  __sanitizer::InitTlsSize () at .../lib/sanitizer_common/sanitizer_linux_libcdep.cc:220
220       void *get_tls_static_info_ptr = dlsym(RTLD_NEXT, "_dl_get_tls_static_info");
(gdb) info record
Active record target: record-btrace
Recording format: Branch Trace Store.
Buffer size: 64kB.
Recorded 0 instructions in 0 functions (0 gaps) for thread 1 (Thread 0xf7c92300 (LWP 20579)).
(gdb) next
226       ...
(gdb) info record
Active record target: record-btrace
Recording format: Branch Trace Store.
Buffer size: 64kB.
Recorded 2859 instructions in 145 functions (0 gaps) for thread 1 (Thread 0xf7c92300 (LWP 20579)).

Limitations:

  • The record buffer has a limited size (this can be increased with set record btrace pt buffer-size <size> for the BTS format above, see the documentation for other types).
  • With record full, not all instructions can be captured. Notably, SSE and AVX instructions are unsupported and will cause gdb to pause execution.
  • There is some overhead while recording every instruction (especially with the full format). Though it should not be as bad as the gdb step approach described in other answers (which has to go through ptrace every time).
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Lekensteyn
  • 64,486
  • 22
  • 159
  • 192
  • Nice answer, but why mention cycle counting? This question is about exact *instruction* counts, and IPC is variable depending on many microarchitectural factors including memory / cache contention from other cores, so it's not even exactly repeatable for the same code. – Peter Cordes Apr 10 '19 at 01:16
  • That might be true for x86, but if you run bare metal ARM with a single core, they will be the same. That was exactly the case I needed a solution for. – Lekensteyn Apr 14 '19 at 12:15
  • If your ARM doesn't have a cache or branch prediction that can be hot or not, and doesn't have possible contention from DMA, then sure. e.g. Cortex-M3. But faster ARM CPUs aren't necessarily as deterministic, where different input data could give different IPC because of more or less cache locality. Actually, not all instructions on Cortex-M3 cost the same cycles (taken branches cost extra to reload the pipeline), so reading the cycle timer still doesn't answer the question there. – Peter Cordes Apr 14 '19 at 15:43