5

I'm trying to learn assembly by compiling simple functions and looking at the output.

I'm looking at calling functions in other libraries. Here's a toy C function that calls a function defined elsewhere:

void give_me_a_ptr(void*);

void foo() {
    give_me_a_ptr("foo");
}

Here's the assembly produced by gcc:

$ gcc -Wall -Wextra -g -O0 -c call_func.c
$ objdump -d call_func.o 

call_func.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   bf 00 00 00 00          mov    $0x0,%edi
   9:   e8 00 00 00 00          callq  e <foo+0xe>
   e:   90                      nop
   f:   5d                      pop    %rbp
  10:   c3                      retq   

I was expecting something like call <give_me_a_ptr@plt>. Why is this jumping to a relative position before it even knows where give_me_a_ptr is defined?

I'm also puzzled by mov $0, %edi. This looks like it's passing a null pointer -- surely mov $address_of_string, %rdi would be correct here?

Wilfred Hughes
  • 29,846
  • 15
  • 139
  • 192
  • Use `objdump -dr` to show relocation entries. – Jester May 01 '16 at 15:25
  • 1
    FYI, `clang -S` emits this instead of your mov 0x0/callq pair: `leaq L_.str(%rip), %rdi; callq _give_me_a_ptr`. – John Zwinck May 01 '16 at 15:26
  • 2
    As for the PLT, that's only used if you compile PIC, use `-fPIC` flag. – Jester May 01 '16 at 15:29
  • 3
    gcc doesn't default to PIC on linux (it does that on darwin, though), unlike, for example, clang. That's why you see absolute addressing here. A good way to learn what the compiler emits for what, is to use `objdump -Sr`, this will intermix the assembly with source, and display the relocation entries. – Leandros May 01 '16 at 15:37
  • @JohnZwinck: Are you on OS X, where executables have to be position-independent? For code where absolute addressing is allowed, [clang uses `mov`-immediate](https://godbolt.org/g/V52xf5) just like gcc. – Peter Cordes May 01 '16 at 21:58
  • @PeterCordes: Indeed I did use OS X for my above comment. Thanks for pointing that out. – John Zwinck May 02 '16 at 00:39

2 Answers2

11

You're not building with symbol-interposition enabled (a side-effect of -fPIC), so the call destination address can potentially be resolved at link time to an address in another object file that is being statically linked into the same executable. (e.g. gcc foo.o bar.o).

However, if the symbol is only found in a library that you're dynamically linking to (gcc foo.o -lbar), the call has to be indirected through the PLT to support.

Now this is the tricky part: without -fPIC or -fPIE, gcc still emits asm that calls the function directly:

int puts(const char*);         // puts exists in libc, so we can link this example
void call_puts(void) { puts("foo"); }

    # gcc 5.3 -O3   (without -fPIC)
    movl    $.LC0, %edi      # absolute 32bit addressing: slightly smaller code, because static data is known to be in the low 2GB, in the default "small" code model
    jmp     puts             # tail-call optimization.  Same as call puts/ret, except for stack alignment

But if you look at the linked binary: (on this Godbolt compiler explorer link, click the "binary" button to toggle between gcc -S asm output and objdump -dr disassembly)

    # disassembled linker output
    mov    $0x400654,%edi
    jmpq   400490 <puts@plt>

During linking, the call to puts was "magically" replaced with indirection through puts@plt, and a puts@plt definition is present in the linked executable.

I don't know the details of how this works, but it's done at link time when linking to a shared library. Crucially, it doesn't require anything in the header files to mark the function prototype as being in a shared library. You get the same results from including <stdio.h> as you do from declaring puts yourself. (This is highly not recommended; it's probably legal for a C implementation to only work properly with the declarations in headers. It happens to work on Linux, though.)


When compiling a position-independent executable (with -fPIE), the linked binary jumps to puts through the PLT, identically to without -fPIC. However, the compiler asm output is different (try it yourself on the godbolt link above):

call_puts:  # compiled with -fPIE
    leaq    .LC0(%rip), %rdi      # RIP-relative addressing for static data
    jmp     puts@PLT

The compiler forces indirection through the PLT for any calls to functions it can't see the definition for. I don't understand why. In PIE mode, we're compiling code for an executable, not a shared library. The linker should be able to link multiple object files into a position-independent executable with direct calls between functions defined in the executable. I'm testing on Linux (my desktop and godbolt), not OS X, where I assume gcc -fPIE is the default. It might be configured differently, IDK.


With -fPIC instead of -fPIE, things are even worse: even calls to global functions defined within the same compilation unit have to go through the PLT, to support symbol interposition. (e.g. LD_PRELOAD=intercept_some_functions.so ./a.out)

The differences between -fPIC and -fPIE are mainly that PIE can assume no symbol interposition for functions in the same compilation unit, but PIC can't. OS X requires position-independent executables, as well as shared libraries, but there is a difference in what the compiler can do when making code for a library vs. making code for an executable.

This Godbolt example has some more functions that demonstrate stuff about PIC and PIE mode, e.g. that call_puts() can't inline into another function in PIC mode, only PIE.

See also: Shared object in Linux without symbol interposition, -fno-semantic-interposition error.


puzzled by mov $0, %edi

You're looking at disassembly output from the .o, where addresses are just placeholder 0s that will be replaced by the linker at link time, based on the relocation information in the ELF object file. That's why @Leandros suggested objdump -r.

Similarly, the relative displacement in the call machine code is all-zeros, because the linker hasn't filled it in yet.

l'L'l
  • 44,951
  • 10
  • 95
  • 146
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
-1

I'm still studying this linking process myself, but wanted to restate something in my own words. The PLT-related user function calls might not all be stuffed with the proper code by the time execution starts. Doing so could take a lot of time at the start of execution; and not all the function calls instrumented by the PLT might even be used. So under a 'lazy binding' method, the very first time a 'user' function is called through the PLT code, it always jumps to the PLT 'binding function' first. The binding function goes out and finds the right address for the 'user' function (I think from the GOT) and then replaces the PLT entry (that points to the binding function) with the code pointing to the 'user' function. So thereafter every time the user function is called, the 'lazy' binding function is not called; the 'user' function is called instead. This might be why the PLT entry looks odd at first blush; it's pointing to the binding function and not to the 'user' function.

  • 2
    Deferred binding happens after the first call to a PLT entry and does not affect whether compiler uses `call puts` or `call puts@PLT`. Also it doesn't modify the PLT, instead it modifies the GOT. The PLT is read-only and uses indirect JMP instructions in the form of `puts@PLT: jmp [puts@GOTPLT]`. Initially the `puts@GOTPLT` entry in the GOT points at code that calls the lazy binding code, after the shared library has been bound it points to `puts` in the shared library. – Ross Ridge Mar 09 '17 at 21:19