0

I am currently practicing with assembly reading by disassemblying C programs and trying to understand what they do.

I am stuck with a trivial one: a simple hello world program.

#include <stdio.h>
#include <stdlib.h>

int main() {
  printf("Hello, world!");
  return(0);
}

When I disassemble the main:

(gdb) disassemble main
Dump of assembler code for function main:
   0x0000000000400526 <+0>: push   rbp
   0x0000000000400527 <+1>: mov    rbp,rsp
   0x000000000040052a <+4>: mov    edi,0x4005c4
   0x000000000040052f <+9>: mov    eax,0x0
   0x0000000000400534 <+14>:    call   0x400400 <printf@plt>
   0x0000000000400539 <+19>:    mov    eax,0x0  
   0x000000000040053e <+24>:    pop    rbp
   0x000000000040053f <+25>:    ret

I understand the first two lines: the base pointer is saved on the stack (by push rbp, which causes the value of the stack pointer to be decreased by 8, because it has "grown") and the value of the stack pointer is saved in the base pointer (so that parameters and local variable can be easily reached through positive and negative offsets, respectively, while the stack can keep "growing").

The third line presents the first issue: why is 0x4005c4 (the address of the "Hello, World!" string) moved in the edi register instead of moving it on the stack? Shouldn't the printf function take the address of that string as parameter? For what I know, functions take parameters from the stack (but here, it looks like the parameter is put in that register: edi)

On another post here on StackOverflow I read that "printf@ptl" is like a stub function that calls the real printf function. I tried to disassemble that function, but it gets even more confusing:

(gdb) disassemble printf
Dump of assembler code for function __printf:
   0x00007ffff7a637b0 <+0>: sub    rsp,0xd8
   0x00007ffff7a637b7 <+7>: test   al,al
   0x00007ffff7a637b9 <+9>: mov    QWORD PTR [rsp+0x28],rsi
   0x00007ffff7a637be <+14>:    mov    QWORD PTR [rsp+0x30],rdx
   0x00007ffff7a637c3 <+19>:    mov    QWORD PTR [rsp+0x38],rcx
   0x00007ffff7a637c8 <+24>:    mov    QWORD PTR [rsp+0x40],r8
   0x00007ffff7a637cd <+29>:    mov    QWORD PTR [rsp+0x48],r9
   0x00007ffff7a637d2 <+34>:    je     0x7ffff7a6380b <__printf+91>
   0x00007ffff7a637d4 <+36>:    movaps XMMWORD PTR [rsp+0x50],xmm0
   0x00007ffff7a637d9 <+41>:    movaps XMMWORD PTR [rsp+0x60],xmm1
   0x00007ffff7a637de <+46>:    movaps XMMWORD PTR [rsp+0x70],xmm2
   0x00007ffff7a637e3 <+51>:    movaps XMMWORD PTR [rsp+0x80],xmm3
   0x00007ffff7a637eb <+59>:    movaps XMMWORD PTR [rsp+0x90],xmm4
   0x00007ffff7a637f3 <+67>:    movaps XMMWORD PTR [rsp+0xa0],xmm5
   0x00007ffff7a637fb <+75>:    movaps XMMWORD PTR [rsp+0xb0],xmm6
   0x00007ffff7a63803 <+83>:    movaps XMMWORD PTR [rsp+0xc0],xmm7
   0x00007ffff7a6380b <+91>:    lea    rax,[rsp+0xe0]
   0x00007ffff7a63813 <+99>:    mov    rsi,rdi
   0x00007ffff7a63816 <+102>:   lea    rdx,[rsp+0x8]
   0x00007ffff7a6381b <+107>:   mov    QWORD PTR [rsp+0x10],rax
   0x00007ffff7a63820 <+112>:   lea    rax,[rsp+0x20]
   0x00007ffff7a63825 <+117>:   mov    DWORD PTR [rsp+0x8],0x8
   0x00007ffff7a6382d <+125>:   mov    DWORD PTR [rsp+0xc],0x30
   0x00007ffff7a63835 <+133>:   mov    QWORD PTR [rsp+0x18],rax
   0x00007ffff7a6383a <+138>:   mov    rax,QWORD PTR [rip+0x36d70f]        # 0x7ffff7dd0f50
   0x00007ffff7a63841 <+145>:   mov    rdi,QWORD PTR [rax]
   0x00007ffff7a63844 <+148>:   call   0x7ffff7a5b130 <_IO_vfprintf_internal>
   0x00007ffff7a63849 <+153>:   add    rsp,0xd8
   0x00007ffff7a63850 <+160>:   ret    
End of assembler dump.

The two mov operations on eax (mov eax, 0x0) bother me a little as well, since I don't get they role in here (but I am more concerned with what I have just described). Thank you in advance.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Mark
  • 405
  • 4
  • 10
  • 2
    Searching for [x86-64 function args stack](http://stackoverflow.com/search?q=x86-64+function+args+stack) finds tons of related questions. None of the ones I looked at seem like an exact duplicate, but next time you're puzzled, please try searching on some of the relevant keywords. – Peter Cordes Aug 02 '16 at 18:58
  • As a suggestion, disassembling from main can sometimes be tricky. It is almost always easier to start off calling a function from main and disassembling that to start with. – David Hoelzer Aug 02 '16 at 21:33

2 Answers2

6

gcc is targeting the x86-64 System V ABI, used by all x86-64 systems other than Windows (for various historical reasons). Its calling convention passes the first few args in registers before falling back to the stack. (See also the Wikipedia basic summary of this calling convention.)

And yes, this is different from the crusty old 32-bit calling conventions that use the stack for everything. This is a Good Thing. See also the tag wiki for more links to ABI docs, and tons of other stuff.

   0x0000000000400526: push   rbp
   0x0000000000400527: mov    rbp,rsp         # stack-frame boilerplate
   0x000000000040052a: mov    edi,0x4005c4    # first arg
   0x000000000040052f: mov    eax,0x0         # 0 FP args in vector registers
   0x0000000000400534: call   0x400400 <printf@plt>
   0x0000000000400539: mov    eax,0x0         # return 0.  If you'd compiled with optimization, this and the previous mov would be  xor eax,eax
   0x000000000040053e: pop    rbp             # clean up stack frame
   0x000000000040053f: ret

Pointers to static data fit into 32 bits, which is why it can use mov edi, imm32 instead of movabs rdi, imm64.

Floating-point args are passed in SSE registers (xmm0-xmm7), even to var-args functions. al indicates how many FP args are in vector registers. (Note that C's type promotion rules mean that float args to variadic functions are always promoted to double, which is why printf doesn't have any format specifiers for float, only double and long double).


printf@ptl is like a stub function that calls the real printf function.

Yes, that's right. The Procedure Linking Table entry starts out as a jmp to a dynamic linker routine that resolves the symbol and modifies the code in the PLT to turn it into a jmp directly to the address where libc's printf definition is mapped. printf is a weak alias for __printf, which is why gdb chooses the __printf label for that address, after you asked for disassembly of printf.

Dump of assembler code for function __printf:
   0x00007ffff7a637b0 <+0>: sub    rsp,0xd8               # reserve space
   0x00007ffff7a637b7 <+7>: test   al,al                  # check if there were any FP args
   0x00007ffff7a637b9 <+9>: mov    QWORD PTR [rsp+0x28],rsi  # store the integer arg-passing registers to local scratch space
   0x00007ffff7a637be <+14>:    mov    QWORD PTR [rsp+0x30],rdx
   0x00007ffff7a637c3 <+19>:    mov    QWORD PTR [rsp+0x38],rcx
   0x00007ffff7a637c8 <+24>:    mov    QWORD PTR [rsp+0x40],r8
   0x00007ffff7a637cd <+29>:    mov    QWORD PTR [rsp+0x48],r9
   0x00007ffff7a637d2 <+34>:    je     0x7ffff7a6380b <__printf+91>  # skip storing the FP arg-passing regs if there were no FP args
   0x00007ffff7a637d4 <+36>:    movaps XMMWORD PTR [rsp+0x50],xmm0
   0x00007ffff7a637d9 <+41>:    movaps XMMWORD PTR [rsp+0x60],xmm1
   0x00007ffff7a637de <+46>:    movaps XMMWORD PTR [rsp+0x70],xmm2
   0x00007ffff7a637e3 <+51>:    movaps XMMWORD PTR [rsp+0x80],xmm3
   0x00007ffff7a637eb <+59>:    movaps XMMWORD PTR [rsp+0x90],xmm4
   0x00007ffff7a637f3 <+67>:    movaps XMMWORD PTR [rsp+0xa0],xmm5
   0x00007ffff7a637fb <+75>:    movaps XMMWORD PTR [rsp+0xb0],xmm6
   0x00007ffff7a63803 <+83>:    movaps XMMWORD PTR [rsp+0xc0],xmm7
       branch_target_from_test_je:
   0x00007ffff7a6380b <+91>:    lea    rax,[rsp+0xe0]            # some more stuff

So printf's implementation keeps the var-args handling simple by storing all the arg-passing registers (except the first one holding the format string) in order to local arrays. It can walk a pointer through them instead of needing switch-like code to extract the right integer or FP arg. It still needs to keep track of the first 5 integer and first 8 FP args, because they aren't contiguous with the rest of the args pushed by the caller onto the stack.

The Windows 64-bit calling convention's shadow space simplifies this by providing space for a function to dump its register args to the stack contiguous with the args already on the stack, but that's not worth wasting 32 bytes of stack on every call, IMO. (See my answer and comments on other answers on Why does Windows64 use a different calling convention from all other OSes on x86-64?)

CL.
  • 173,858
  • 17
  • 217
  • 259
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
2

there is nothing trivial about printf, not the first choice for what you are trying to do but, turned out to be not overly complicated.

Something simpler:

extern unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
    return(more_fun(x)+7);
}
0000000000000000 <fun>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   e8 00 00 00 00          callq  9 <fun+0x9>
   9:   48 83 c4 08             add    $0x8,%rsp
   d:   83 c0 07                add    $0x7,%eax
  10:   c3                      retq  

and the stack is used. eax used for the return.

now use a pointer

extern unsigned int more_fun ( unsigned int * );
unsigned int fun ( unsigned int x )
{
    return(more_fun(&x)+7);
}
0000000000000000 <fun>:
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   89 7c 24 0c             mov    %edi,0xc(%rsp)
   8:   48 8d 7c 24 0c          lea    0xc(%rsp),%rdi
   d:   e8 00 00 00 00          callq  12 <fun+0x12>
  12:   48 83 c4 18             add    $0x18,%rsp
  16:   83 c0 07                add    $0x7,%eax
  19:   c3                      retq   

and there you go edi used as in your case.

two pointers

extern unsigned int more_fun ( unsigned int *, unsigned int * );
unsigned int fun ( unsigned int x, unsigned int y )
{
    return(more_fun(&x,&y)+7);
}
0000000000000000 <fun>:
   0:   48 83 ec 18             sub    $0x18,%rsp
   4:   89 7c 24 0c             mov    %edi,0xc(%rsp)
   8:   89 74 24 08             mov    %esi,0x8(%rsp)
   c:   48 8d 7c 24 0c          lea    0xc(%rsp),%rdi
  11:   48 8d 74 24 08          lea    0x8(%rsp),%rsi
  16:   e8 00 00 00 00          callq  1b <fun+0x1b>
  1b:   48 83 c4 18             add    $0x18,%rsp
  1f:   83 c0 07                add    $0x7,%eax
  22:   c3                      retq   

now edi and esi are used. all looking like it is the calling convention to me...

a string

extern unsigned int more_fun ( const char * );
unsigned int fun ( void  )
{
    return(more_fun("Hello World")+7);
}
0000000000000000 <fun>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   bf 00 00 00 00          mov    $0x0,%edi
   9:   e8 00 00 00 00          callq  e <fun+0xe>
   e:   48 83 c4 08             add    $0x8,%rsp
  12:   83 c0 07                add    $0x7,%eax
  15:   c3                      retq  

eax is not prepped as in printf, so perhaps eax has something to do with the number of parameters that follow, try putting more parameters on your printf and see if eax going in changes.

if I add -m32 on my command line then edi is not used.

00000000 <fun>:
   0:   83 ec 18                sub    $0x18,%esp
   3:   68 00 00 00 00          push   $0x0
   8:   e8 fc ff ff ff          call   9 <fun+0x9>
   d:   83 c4 1c                add    $0x1c,%esp
  10:   83 c0 07                add    $0x7,%eax
  13:   c3 

I suspect the push is a placeholder for the linker to push the address to the string when the linker patches up the binary, this was just an object. So my guess is when you have a 64 bit pointer, the first one or two go into registers then the stack is used after it runs out of registers.

Obviously the compiler works so this is conforming to the compilers calling convention.

extern unsigned int more_fun ( unsigned int );
unsigned int fun ( unsigned int x )
{
    return(more_fun(x+5)+7);
}
0000000000000000 <fun>:
   0:   48 83 ec 08             sub    $0x8,%rsp
   4:   83 c7 05                add    $0x5,%edi
   7:   e8 00 00 00 00          callq  c <fun+0xc>
   c:   48 83 c4 08             add    $0x8,%rsp
  10:   83 c0 07                add    $0x7,%eax
  13:   c3                      retq   

correction based on Peter's comment. Yeah it does appear that registers are being used here.

And since he mentioned 6 parameters, lets try 7.

extern unsigned int more_fun
(
unsigned int,
unsigned int,
unsigned int,
unsigned int,
unsigned int,
unsigned int,
unsigned int
);
unsigned int fun (
unsigned int a,
unsigned int b,
unsigned int c,
unsigned int d,
unsigned int e,
unsigned int f,
unsigned int g
)
{
    return(more_fun(a+1,b+2,c+3,d+4,e+5,f+6,g+7)+17);
}
0000000000000000 <fun>:
   0:   48 83 ec 10             sub    $0x10,%rsp
   4:   83 c1 04                add    $0x4,%ecx
   7:   83 c2 03                add    $0x3,%edx
   a:   8b 44 24 18             mov    0x18(%rsp),%eax
   e:   83 c6 02                add    $0x2,%esi
  11:   83 c7 01                add    $0x1,%edi
  14:   41 83 c1 06             add    $0x6,%r9d
  18:   41 83 c0 05             add    $0x5,%r8d
  1c:   83 c0 07                add    $0x7,%eax
  1f:   50                      push   %rax
  20:   e8 00 00 00 00          callq  25 <fun+0x25>
  25:   48 83 c4 18             add    $0x18,%rsp
  29:   83 c0 11                add    $0x11,%eax
  2c:   c3                      retq   

and sure enough that 7th parameter was pulled from the stack modified and put back on the stack before the call. The other 6 in registers.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • Your first example only adjusts `%rsp` so it's 16B aligned before the `call`. No args are passed on the stack. And yes, `%al` holds the number of FP args passed in xmm registers (up to 8) in the SysV x86-64 ABI. The first 6 integer args go in registers (not just 1 or 2). – Peter Cordes Aug 02 '16 at 18:25
  • The `$0x0` is a placeholder because you disassembled the `.o` instead of a linked binary, or looking at `gcc -S` output. If you used `objdump -dr`, you'd see symbol relocation info on that line. – Peter Cordes Aug 02 '16 at 18:27
  • Without linking I was being vague as to whether or not the push was a placeholder for the address, a 32 bit immediate doesnt make sense though for a 64 bit address, clearly just an offset. Not interesting in linking just reassuring the OP they are on the right track with what they are doing (compiling, disassembling and examining the results). Had the string been passed in rather than used for the first time in the function, the edi modification would not have happened. An exercise for the OP to confirm. – old_timer Aug 02 '16 at 18:33
  • Just saw this question again because of edits. 32-bit immediates do make sense for absolute addresses, actually. The default code model in the x86-64 SysV ABI puts static code/data in the low 2G of virtual address space exactly for this reason. So in a non-PIC executable, `puts("hello");` will compile to `mov $.LC0, %edi` / `call puts` to get the address into a 64-bit register. Low 2G so it also works in contexts where an immediate is sign-extended, like `pushq $imm32`. – Peter Cordes Sep 21 '17 at 09:57