0

I've, for a few hours, been trying to enlarge my understanding of Assembly Language, by trying to read and understand the instructions of a very simple program I wrote in C to initiate myself to how arguments were handled in ASM.

#include <stdio.h>

int say_hello();

int main(void) {

    printf("say_hello() -> %d\n", say_hello(10, 20, 30, 40, 50, 60, 70, 80, 90, 100));

}

int say_hello(int a, int b, int c, int d, int e, int f, int g, int h, int i, int j) {

    printf("a:b:c:d:e:f:g:h:i:j -> %d:%d:%d:%d:%d:%d:%d:%d:%d:%d\n", a, b, c, d, e, f, g, h, i, j);

    return 1000;

}

The program is as I said, very basic and contains two functions, the main and another one called say_hello which takes 10 arguments, from a to j and print each one of them in a printf call. I've tried doing the same process (So trying to understand the instructions and what's happening), with the same program and less arguments, I think I was able to understand most of it, but then I was wondering, "ok but what's happening if I have so many arguments, there isn't any more register available to store the value in"

So I went to look for how many registers were available and usable in my case, and I found out from this website that "only" (not sure, correct me if I'm wrong) the following registers could be used in my case to store argument values in them edi, esi, r8d, r9d, r10d, r11d, edx, ecx, which is 8, so I went to modify my C program and I added a few more arguments, so that I reach the 8 limit, I even added one more, I don't really know why, let's say just in case.

So when I compiled my program using gcc with no optimization related option whatsoever, I was expecting the main() function to push the values that were left after all the 8 registers have been used, but I wasn't expecting anything from the say_hello() method, that's pretty much why I tried this out in the first place.

So I went to compile my program, then disassembled it using the objdump command (More specifically, this is the full command I used: objdump -d -M intel helloworld) and I started looking for my main method, which was doing pretty much as I expected

000000000000064a <main>:
 64a:   55                      push   rbp
 64b:   48 89 e5                mov    rbp,rsp
 64e:   6a 64                   push   0x64
 650:   6a 5a                   push   0x5a
 652:   6a 50                   push   0x50
 654:   6a 46                   push   0x46
 656:   41 b9 3c 00 00 00       mov    r9d,0x3c
 65c:   41 b8 32 00 00 00       mov    r8d,0x32
 662:   b9 28 00 00 00          mov    ecx,0x28
 667:   ba 1e 00 00 00          mov    edx,0x1e
 66c:   be 14 00 00 00          mov    esi,0x14
 671:   bf 0a 00 00 00          mov    edi,0xa
 676:   b8 00 00 00 00          mov    eax,0x0
 67b:   e8 1e 00 00 00          call   69e <say_hello>
 680:   48 83 c4 20             add    rsp,0x20
 684:   89 c6                   mov    esi,eax
 686:   48 8d 3d 0b 01 00 00    lea    rdi,[rip+0x10b]        # 798 <_IO_stdin_used+0x8>
 68d:   b8 00 00 00 00          mov    eax,0x0
 692:   e8 89 fe ff ff          call   520 <printf@plt>
 697:   b8 00 00 00 00          mov    eax,0x0
 69c:   c9                      leave
 69d:   c3                      ret

So it, as I expected pushed the values that were left after all the registers had been used into the stack, and then just did the usual work to pass values from one method to another. But then I went to look for the say_hello method, and it got me really confused.

000000000000069e <say_hello>:
 69e:   55                      push   rbp
 69f:   48 89 e5                mov    rbp,rsp
 6a2:   48 83 ec 20             sub    rsp,0x20
 6a6:   89 7d fc                mov    DWORD PTR [rbp-0x4],edi
 6a9:   89 75 f8                mov    DWORD PTR [rbp-0x8],esi
 6ac:   89 55 f4                mov    DWORD PTR [rbp-0xc],edx
 6af:   89 4d f0                mov    DWORD PTR [rbp-0x10],ecx
 6b2:   44 89 45 ec             mov    DWORD PTR [rbp-0x14],r8d
 6b6:   44 89 4d e8             mov    DWORD PTR [rbp-0x18],r9d
 6ba:   44 8b 45 ec             mov    r8d,DWORD PTR [rbp-0x14]
 6be:   8b 7d f0                mov    edi,DWORD PTR [rbp-0x10]
 6c1:   8b 4d f4                mov    ecx,DWORD PTR [rbp-0xc]
 6c4:   8b 55 f8                mov    edx,DWORD PTR [rbp-0x8]
 6c7:   8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
 6ca:   48 83 ec 08             sub    rsp,0x8
 6ce:   8b 75 28                mov    esi,DWORD PTR [rbp+0x28]
 6d1:   56                      push   rsi
 6d2:   8b 75 20                mov    esi,DWORD PTR [rbp+0x20]
 6d5:   56                      push   rsi
 6d6:   8b 75 18                mov    esi,DWORD PTR [rbp+0x18]
 6d9:   56                      push   rsi
 6da:   8b 75 10                mov    esi,DWORD PTR [rbp+0x10]
 6dd:   56                      push   rsi
 6de:   8b 75 e8                mov    esi,DWORD PTR [rbp-0x18]
 6e1:   56                      push   rsi
 6e2:   45 89 c1                mov    r9d,r8d
 6e5:   41 89 f8                mov    r8d,edi
 6e8:   89 c6                   mov    esi,eax
 6ea:   48 8d 3d bf 00 00 00    lea    rdi,[rip+0xbf]        # 7b0 <_IO_stdin_used+0x20>
 6f1:   b8 00 00 00 00          mov    eax,0x0
 6f6:   e8 25 fe ff ff          call   520 <printf@plt>
 6fb:   48 83 c4 30             add    rsp,0x30
 6ff:   b8 e8 03 00 00          mov    eax,0x3e8
 704:   c9                      leave
 705:   c3                      ret
 706:   66 2e 0f 1f 84 00 00    nop    WORD PTR cs:[rax+rax*1+0x0]
 70d:   00 00 00

I'm really sorry in advance, I'm not exactly sure I really understand well what the square brackets do, but from what I've read and understand it's a way to "point" to the address containing the value I want (please correct me if I'm wrong), so for example mov DWORD PTR [rbp-0x4],edi moves the value in edi to the value at the address rsp-0x4, right?

I'm also not actually not sure why this process is required, can't the say_hello method just read edi for example and that's it? Why does the program have to move it into [rbp-0x4] and then re-reading it back from [rbp-0x4] to eax ?

So the program just goes on and reads every value it needs and put them into an available register, and when it reaches the point when there's no register left, it just starts moving all of them into esi and then pushing them onto the stack, then repeating the process until all the 10 arguments have been stored somewhere.

So that makes sense, I was satisfied and then just went to double check if I really had got it well, so I started reading from bottom to top, starting from 0x6ea to 0x6e2 so the sample I'm working on is

 6e2:   45 89 c1                mov    r9d,r8d
 6e5:   41 89 f8                mov    r8d,edi
 6e8:   89 c6                   mov    esi,eax
 6ea:   48 8d 3d bf 00 00 00    lea    rdi,[rip+0xbf]        # 7b0 <_IO_stdin_used+0x20>

So just like on all my previous tests, I was expecting the arguments to go in "reverse" like the first argument is the last instruction executed, and the last one the first instruction executed, so I started double checking every field.

So the first one, rdi was [rip+0x10b] which I thought for sure was pointing to my string.

So then I moved to 0x6e8, which moves eax which is currently equal to the value stored in [rbp-0x4], which is equal to edi as stated at 0x6a6, and edi is equal to 0xa (10) as stated on 0x671, so my first argument is my string, and the second one is 10, which is exactly what I expected.

But then when I jumped on the instruction executed right before 0x6e8, so 0x6e5 I was expecting it to be 20, so I did the same process. edi is moved to r8d and is currently equal to the value stored in [rbp-0x10] which is equal to ecx which is equal to, as stated at 0x662.. 40? What the heck? I'm confused, why would it be 40? Then I tried looking up the instruction right above that one, and found 50, and did the same for the next one, and again I found 60!! Why? Is the way I get those values wrong? Am I missing something in the instructions? Or did I just assume something by looking at my previous programs (which all had way less arguments, and were all in "reverse" like I said earlier) that I should not have?

I'm sorry if this is a dumb post, I'm very new to ASM (few hours of experience!) and just trying to get my mind cleared on that one, as I really can't figure it out alone. I'm also sorry if this post is too long, I was trying to include a lot of informations so that what I'm trying to do is clear, the result I get is clear, and what my problem is is clear aswell. Thanks a lot for reading and even a bigger thanks to anyone who will help!

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Peter Wave
  • 11
  • 3
  • 2
    The x86-64 System V ABI's calling convention doesn't use r10 or r11 for arg-passing, just rdi, rsi, edx, ecx, r8, r9 in that order. [What are the calling conventions for UNIX & Linux system calls on i386 and x86-64](https://stackoverflow.com/q/2535989) The regs you listed are the call-clobbered registers that functions can use *internally* without saving/restoring. [What registers are preserved through a linux x86-64 function call](https://stackoverflow.com/q/18024672) – Peter Cordes May 26 '20 at 00:58
  • 1
    See also [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) for an intro to looking at compiler output, especially Matt Godbolt's CppCon talk video. – Peter Cordes May 26 '20 at 01:02
  • 1
    "Why does the program have to move it into [rbp-0x4] and then re-reading it back from [rbp-0x4] to eax ?" Because when optimizations are off, the compiler generates really stupid code and makes no attempt to remove redundant stuff like this. If you turn on optimization, it may look more sensible. – Nate Eldredge May 26 '20 at 01:06
  • start with simple functions like int fun ( int x, int y) { return(x+(y<<5)); }. optimize and disassemble. you are making it more complicated than necessary... – old_timer May 26 '20 at 14:02
  • best to learn most any instruction set other than x86 as your first one, arm, thumb, mps430, avr, risc-v, mips, etc... – old_timer May 26 '20 at 14:04

0 Answers0