Why does gcc use a relative address to the function pointer in assembly?

Question

The C source:

int sum(int a, int b) {    
    return a + b;    
}    

int main() {    
    int (*ptr_sum_1)(int,int) = sum;   // assign the address of the "sum" 
    int (*ptr_sum_2)(int,int) = sum;   // to the function pointer 
    int (*ptr_sum_3)(int,int) = sum;    

    int a = (*ptr_sum_1)(2,4);   // call the "sum" through the pointer 
    int b = sum(2,4);            // call the "sum" by usual way

    return 0;    
}

The crucial part of the assembly code:

lea rax, sum[rip]
mov QWORD PTR -24[rbp], rax
lea rax, sum[rip]
mov QWORD PTR -16[rbp], rax
lea rax, sum[rip]
mov QWORD PTR -8[rbp], rax

The executing program instructions from GDB:

   0x5fa <sum>: push   rbp
   0x5fb <sum+1>:   mov    rbp,rsp
   0x5fe <sum+4>:   mov    DWORD PTR [rbp-0x4],edi
   0x601 <sum+7>:   mov    DWORD PTR [rbp-0x8],esi
   0x604 <sum+10>:  mov    edx,DWORD PTR [rbp-0x4]
   0x607 <sum+13>:  mov    eax,DWORD PTR [rbp-0x8]
   0x60a <sum+16>:  add    eax,edx
   0x60c <sum+18>:  pop    rbp
   0x60d <sum+19>:  ret    
   0x60e <main>:    push   rbp
   0x60f <main+1>:  mov    rbp,rsp
   0x612 <main+4>:  sub    rsp,0x20
   0x616 <main+8>:  lea    rax,[rip+0xffffffffffffffdd]        # 0x5fa <sum>
   0x61d <main+15>: mov    QWORD PTR [rbp-0x18],rax
   0x621 <main+19>: lea    rax,[rip+0xffffffffffffffd2]        # 0x5fa <sum>
   0x628 <main+26>: mov    QWORD PTR [rbp-0x10],rax
   0x62c <main+30>: lea    rax,[rip+0xffffffffffffffc7]        # 0x5fa <sum>
   0x633 <main+37>: mov    QWORD PTR [rbp-0x8],rax
   0x637 <main+41>: mov    rax,QWORD PTR [rbp-0x18]
   0x63b <main+45>: mov    esi,0x4
   0x640 <main+50>: mov    edi,0x2
   0x645 <main+55>: call   rax
   0x647 <main+57>: mov    DWORD PTR [rbp-0x20],eax
   0x64a <main+60>: mov    esi,0x4
   0x64f <main+65>: mov    edi,0x2
   0x654 <main+70>: call   0x5fa <sum>
   0x659 <main+75>: mov    DWORD PTR [rbp-0x1c],eax
   0x65c <main+78>: mov    eax,0x0
   0x661 <main+83>: leave  
   0x662 <main+84>: ret

I think that the sum label is just the starting address of the sum procedure - 0x5fa, so I don't understand why gcc can't use it directly, but uses the calculation sum[rip] for this.

Question:

Why is sum[rip] used in the lea rax, sum[rip] instruction in assembly, instead of the simple sum label, e.g. lea rax, sum?
Will the mov rax, 0x5fa instruction do the same? Because we know the sum address after linking: the call 0x5fa <sum> instruction just uses it directly.

*The executing program instructions from GDB:* no, it's not executing yet. You disassembled the executable from inside GDB, but the addresses are offsets from the image base (which isn't decided until the process starts). *After* a `start` command, you'd see addresses like `0x5555555546aa` for `main`. This address is *not* a link-time constant so it can't be used as a 32-bit immediate for `mov`. (Also it doesn't fit in 32 bits, but static addresses in position-dependent executables *do*, on Linux.) — Peter Cordes, Feb 17 '19 at 00:49
@PeterCordes You right, I just did `x /30i sum`, without `start`. And was wondered why all addresses so short, because usually them looks like `0x5555555545fa`, as you said :) — MiniMax, Feb 17 '19 at 07:46
@PeterCordes "This address is not a link-time constant so it can't be used as a 32-bit immediate for mov." But why this instruction use it as constant: `0x654
: call 0x5fa `? — MiniMax, Feb 18 '19 at 09:12
near call/jmp use `rel32` encodings (https://www.felixcloutier.com/x86/CALL.html), and the distance between two static addresses *is* a link-time constant. (Or assemble-time, for locations from the same source file). GDB's disassembler fills in the absolute address because that's more useful, but if you look a the hexdump you'll see the relative encoding. Use `disas /r` in GDB, or use `objdump -d`. — Peter Cordes, Feb 18 '19 at 12:34

Zan Lynx · Answer 1 · 2019-02-17T01:43:20.670

-1

I believe that it might depend on your build of GCC, but on the Linux distributions that I use everything is set up to default to PIC builds. That's Position Independent Code. It's better for both shared libraries and executables, because the result can be mapped into memory anywhere without needing a fixup pass. It's better for security because ASLR can be applied.

With x86-64 there's no significant penalty for using PIC so why wouldn't it be used everywhere?

edited Feb 17 '19 at 01:43

answered Feb 17 '19 at 00:05

Zan Lynx

53,022
10
79
131

1

There is a penalty for PIC here: an extra `lea` instruction has to be used where a normal displacement would have sufficed otherwise. And no, until recently, PIC was not used by default in amd64 binaries on UNIX. This is a rather recent development known as PIE. I have downvoted your answer because it is factually incorrect and does not capture the actual motivations for introducing PIE. – fuz Feb 17 '19 at 00:34
The penalty for position-independent code is small but non-zero. The penalty for `gcc -fPIC` (to make shared-library safe code that respects symbol interposition) is significantly higher for access to non-`hidden` global variables and inability to inline functions. Anyway, non-PIE executables don't use fixups, they have a fixed load address chosen by the linker, so `mov edi, imm32` can put a static address into a register. And you can index an array with `[table + rcx*4]`. See [32-bit absolute addresses no longer allowed in x86-64 Linux?](//stackoverflow.com/q/43367427) – Peter Cordes Feb 17 '19 at 00:44

Why does gcc use a relative address to the function pointer in assembly?

1 Answers1