Looks correct except for stack alignment. I'm guessing you're on Linux, given the nasm -f elf32
.
Thus you're using the calling convention documented in the i386 System V ABI. It happens to be essentially the same as MS Windows CDECL, and some people call it that, but it differs from Windows in how 64-bit structs are returned for example (in memory on i386 SysV vs. in EDX:EAX in Windows cdecl.) If you say "cdecl" to describe your calling convention, with the context being Linux, most people will know that you mean i386 SysV as opposed to gcc -mregparm=3
or something, but IMO "i386 SysV" is nearly as short and much more specific / accurate as a description.
The i386 System V ABI requires 16-byte stack alignment before a call
, thus ESP % 16 == 12
on function entry after call
pushes a return address. (This applies in the ABI version used on modern Linux, due to GCC's accidental reliance -mpreferred-stack-boundary=4
which in 32-bit mode was supposed to be optional / best-effort to help performance. Documenting as a new requirement was the least-bad way out of the situation with binaries in the wild that would crash with misaligned ESP. Many other OSes using i386 System V, such as MacOS and FreeBSD, didn't adopt that change, and hand-written asm that only maintains 4-byte stack alignment is still correct there.)
Most of 32-bit library functions don't actually depend on that (e.g. they don't use movaps
for 16-byte copies for locals in stack space the way many 64-bit functions do; see glibc scanf Segmentation faults when called from a function that doesn't align RSP). So in 32-bit code it's common that it will happen to work anyway, but could break in some future Linux distro, exactly the kind of invisible bug you were asking about. Assembly language is not one where trial and error can prove your code is safe and correct.
You have 16 bytes of stack adjustment before the call, rather than 12 + 16*n
. (4 each from push ebp
and sub esp,4
, then 2 more pushes of args for scanf). You could drop the use of EBP as a frame pointer (and adjust the 2 instructions that referenced stack space to use ESP instead), or sub esp,16
instead of 4
.
You might be tempted to ask scanf to overwrite one of its args with the conversion result. That would probably be safe in practice, especially the int *
since it needs the pointer in a register to store through it, and it has no reason to read it again after. But functions own their incoming stack args, and can re-read them any number of times, and assume that no pointers alias them. scanf
could in theory be reusing its incoming arg space for its own temporaries. Or more plausibly, as args for a tailcall; reusing arg space that way is something real compilers do.
mov esp, ebp
is not redundant per-se; ESP is still 4 bytes below EBP at that point, as you can see from single-stepping your code with gdb
and watching registers change with layout reg
. To avoid needing it, you'd have to change other instructions. e.g. mov eax, [ebp - 4]
could be pop eax
.
It would not be safe to change add esp,8
to add esp,12
unless you move it after the mov eax, [ebp - 4]
. Any space below ESP can be trashed asynchronously, e.g. by a signal handler if you had one. Only x86-64 System V has a red-zone below the stack pointer that's safe from that.
There is one instruction in your code that's fully redundant, add esp, 8
. You're about to reset ESP to EBP, popping all stack allocations including the arg space. (If you'd been making multiple function calls, you could let some args accumulate or reusing their space with mov
stores instead of pushes and pops. But push
and pop
are good for machine code density.)
The format string doesn't need to be in read-write .data
; it can be in .rodata
. (Or you could push
-immediate and push esp
, but it's normal to just put strings in .rodata
even when they're tiny.)
If I was writing this for efficiency and minimalism, not doing stuff that isn't required or even useful for this specific function, but not aiming for simplicity and easiest to understand, I might have written this:
extern scanf
section .rodata
fmt: db "%d", 0 ; style: plain 0 seems appropriate to me as a terminator byte
section .text
global main ; I prefer putting global next to the label, like in C how static is with a function definition, not at the top of the file, but either is valid
main:
; ESP % 16 == 12 on entry
push -1 ; allocate int x, with a value in case the scanf conversion fails
push esp ; &x. push esp snapshots the input register before the -= 4 part of push
push fmt ; push format string on stack (4 byte address)
call scanf ; 3x 4-byte pushes, ESP % 16 == 0 ready for a call
; TODO: check EAX==1 to make sure the conversion succeeded.
add esp, 8 ; clean the args from the stack
pop eax ; load the %d conversion result. (Or the value we pushed earlier if scanf didn't write it)
; epilogue
; nothing to do here, we didn't save any call-preserved registers
ret
See What is an assembly-level representation of pushl/popl %esp? re: the details of what happens when you push esp
: it pushes the old value of ESP, snapshotted before the esp-=4
effect of the push
itself.
BTW, this is borderline code-review. Code review questions are off-topic on SO these days, but there is https://codereview.stackexchange.com/. This is so short and doing so little that I think it's ok as an SO question.