How does GDB determine the address to break at when you do "break function-name"?

Question

A simple example that demonstrates my issue:

// test.c
#include <stdio.h>

int foo1(int i) {
    i = i * 2;
    return i;
}

void foo2(int i) {
    printf("greetings from foo! i = %i", i);
}

int main() {
    int i = 7;
    foo1(i);
    foo2(i);
    return 0;
}

$ clang -o test -O0 -Wall -g test.c

Inside GDB I do the following and start the execution:

(gdb) b foo1  
(gdb) b foo2

After reaching the first breakpoint, I disassemble:

(gdb) disassemble 
Dump of assembler code for function foo1:
   0x0000000000400530 <+0>:     push   %rbp
   0x0000000000400531 <+1>:     mov    %rsp,%rbp
   0x0000000000400534 <+4>:     mov    %edi,-0x4(%rbp)
=> 0x0000000000400537 <+7>:     mov    -0x4(%rbp),%edi
   0x000000000040053a <+10>:    shl    $0x1,%edi
   0x000000000040053d <+13>:    mov    %edi,-0x4(%rbp)
   0x0000000000400540 <+16>:    mov    -0x4(%rbp),%eax
   0x0000000000400543 <+19>:    pop    %rbp
   0x0000000000400544 <+20>:    retq   
End of assembler dump.

I do the same after reaching the second breakpoint:

(gdb) disassemble 
Dump of assembler code for function foo2:
   0x0000000000400550 <+0>:     push   %rbp
   0x0000000000400551 <+1>:     mov    %rsp,%rbp
   0x0000000000400554 <+4>:     sub    $0x10,%rsp
   0x0000000000400558 <+8>:     lea    0x400644,%rax
   0x0000000000400560 <+16>:    mov    %edi,-0x4(%rbp)
=> 0x0000000000400563 <+19>:    mov    -0x4(%rbp),%esi
   0x0000000000400566 <+22>:    mov    %rax,%rdi
   0x0000000000400569 <+25>:    mov    $0x0,%al
   0x000000000040056b <+27>:    callq  0x400410 <printf@plt>
   0x0000000000400570 <+32>:    mov    %eax,-0x8(%rbp)
   0x0000000000400573 <+35>:    add    $0x10,%rsp
   0x0000000000400577 <+39>:    pop    %rbp
   0x0000000000400578 <+40>:    retq   
End of assembler dump.

GDB obviously uses different offsets (+7 in foo1 and +19 in foo2), with respect to the beginning of the function, when setting the breakpoint. How can I determine this offset by myself without using GDB?

Are you interested in the stack frame as well? If so you don't need that offset. It seems that GDB is deliberately skipping stack frame adjustments. — Marco A., Aug 28 '14 at 10:02
Isn't it simply the first instruction of the function's body ? — Quentin, Aug 28 '14 at 10:02
It looks like GDB skips the function prologue and sets breakpoint to the first Assembly line that matches the first C source line. It gets the information from .PDB file, finally this is C compiler which provides this information. I don't think that this can be called "function entry point". Function is always executed from the first Assembly line. — Alex F, Aug 28 '14 at 10:10
Forgot to mention, .pdb is from Windows, in Linux debug information is kept in another way, but the meaning is the same. — Alex F, Aug 28 '14 at 10:25

score 5 · Accepted Answer · answered Aug 28 '14 at 16:29

gdb uses a few methods to decide this information.

First, the very best way is if your compiler emits DWARF describing the function. Then gdb can decode the DWARF to find the end of the prologue.

However, this isn't always available. GCC emits it, but IIRC only when optimization is used.

I believe there's also a convention that if the first line number of a function is repeated in the line table, then the address of the second instance is used as the end of the prologue. That is if the lines look like:

< function f >
line 23  0xffff0000
line 23  0xffff0010

Then gdb will assume that the function f's prologue is complete at 0xfff0010.

I think this is the mode used by gcc when not optimizing.

Finally gdb has some prologue decoders that know how common prologues are written on many platforms. These are used when debuginfo isn't available, though offhand I don't recall what the purpose of that is.

score 5 · Answer 2 · edited May 23 '17 at 12:22

As others mentioned, even without debugging symbols GDB has a function prologue decoder, i.e. heuristic magic.

To disable that, you can add an asterisk before the function name:

break *func

On Binutils 2.25 the skip algorithm on seems to be at: symtab.c:skip_prologue_sal, which breakpoints.c:break_command, the command definition, calls indirectly.

The prologue is a common "boilerplate" used at the start of function calls.

The prologues of foo2 is longer than that of foo1 by two instructions because:

sub $0x10,%rsp

foo2 calls another function, so it is not a leaf function. This prevents some optimizations, in particular it must reduce the rsp before another call to save room for the local state.

Leaf functions don't need that because of the 128 byte ABI red zone, see also: Why does the x86-64 GCC function prologue allocate less stack than the local variables?

foo1 however is a leaf function.
lea 0x400644,%rax

For some reason, clang stores the address of local string constants (stored in .rodata) in registers as part of the function prologue.

We know that rax contains "greetings from foo! i = %i" because it is then passed to %rdi, the first argument of printf.

foo1 does not have local strings constants however.

The other instructions of the prologue are common to both functions:

rbp manipulation is discussed at: What is the purpose of the EBP frame pointer register?
mov %edi,-0x4(%rbp) stores the first argument on the stack. This is not required on leaf functions, but clang does it anyways. It makes register allocation easier.

score 2 · Answer 3 · answered Aug 28 '14 at 10:03

2

On ELF platforms like linux, debug information is stored in a separate (non-executable) section in the executable. In this separate section there is all the information that is needed by the debugger. Check the DWARF2 specification for the specifics.

answered Aug 28 '14 at 10:03

doron

27,972
12
65
103

How does GDB determine the address to break at when you do "break function-name"?

3 Answers3

Linked

Related