1

I'm trying to translate the following assembly code to C code:

fct:
    movl 4(%esp), %eax
    cmpl $0,%eax
    jg n
    movl $-1,%eax
    ret
n:  movl $0,%ebx
    movl %eax, %ecx
    movl $0, %eax
    movl $0, %edx
l:  addl $2, %ebx
    addl %ebx, %eax
    addl $1, %edx
    cmpl %ecx, %edx
    jl l
    ret

As I think I can translate most of this pretty easily, I can't seem to find what the first line (movl 4(%esp), %eax) does. What does 4(%esp) refer to in this context? I know the %esp register refers to the last instruction of the pile and that 4(%esp) refers to the second one.

1 Answers1

5

mov is a "move" instruction. l in movl means that it operates on "long" value (32-bit in your case). Parenthesis around %esp in (%esp) mean that it should move not the content of the register %esp, but should load memory at address in register %esp. 4 in 4(%esp) refers to offset that is added to %esp before it is dereferenced.

So this instruction loads 32-bit value from address %esp + 4 and stores it in register %eax.

Because in x86 all function arguments are store on the thread stack (mostly, other calling conventions can be used), this instruction loads function argument in register %eax.

In C arguments are pushed in reversed order (from the last to the first), so it loads the first argument.

Seems that original function was defined in C like this:

int fct(int val);

Instruction jg is generated for signed greater comparison, so the first line seems to be

if (val > 0)
   ...
  • @EdouarddeSchaetzen if your code was called by `call fct`, then memory at `esp+0` contains return address, and at `esp+4` there is last value pushed into stack by caller, often used as "argument" for function, i.e. `push eax ; argument for function in stack` `call fct ; call subroutine`. ... So to recreate this in C, if your target platform does use stack for passing arguments, the prototype of your function should be: `int fct(int argument);` – Ped7g May 29 '18 at 10:48
  • *Because in x86 all function arguments are store on stack* only for some calling conventions. (But true for the i386 System V calling convention used on Linux for 32-bit code.) All the 64-bit x86 calling conventions pass some args in regs, and so does 32-bit Windows `__fastcall`, or the rarely-used 32-bit `gcc -m32 -mregparm=3`. – Peter Cordes May 29 '18 at 11:16
  • @PeterCordes Assembly is for x86, so `__fastcall` is basically the only alternative here because since compiler can decide to use it automatically (which at least MSVC does for `static` functions). –  May 29 '18 at 11:25
  • x86 is a generic term, it includes 32 and 64-bit modes. (At least that's the convention in CPU-architecture terms, and in Linux. Only in the Windows world does x86 strictly mean 32-bit.) I should have just said that in the first place, because I think you meant "32-bit" when you said "x86". – Peter Cordes May 29 '18 at 11:31
  • @PeterCordes Yes, I meant 32-bit architecture. On Linux I mostly saw i386/i486/i686 for 32-bit architecture and x86_64 or amd64 for 64-bit architecture. Sometimes people casually refer to both as "x86", but thats it, never really saw widespread usage of x86 for x86_64. –  May 29 '18 at 11:35
  • The Linux kernel's source tree has all the i386 and x86-64/amd64 stuff under `arch/x86/...`. e.g. [the entry-points](https://github.com/torvalds/linux/tree/master/arch/x86/entry) into a 32-bit kernel in `entry_32.S`, entry into a 64-bit kernel in `entry_64.S`, and entry from 32-bit user-space into a 64-bit kernel in `entry_64_compat.S` That's one project, but it's *very* prominent. I think some other major projects have a unified x86 subdirectory. And it's very common to say this or that "is fast on x86". – Peter Cordes May 29 '18 at 11:43