6

I've written a simple assembly program:

section .data
str_out db "%d ",10,0
section .text
extern printf
extern exit
global main
main:

MOV EDX, ESP
MOV EAX, EDX
PUSH EAX
PUSH str_out
CALL printf
SUB ESP, 8 ; cleanup stack
MOV EAX, EDX
PUSH EAX
PUSH str_out
CALL printf
SUB ESP, 8 ; cleanup stack
CALL exit

I am the NASM assembler and the GCC to link the object file to an executable on linux.

Essentially, this program is first putting the value of the stack pointer into register EDX, it is then printing the contents of this register twice. However, after the second printf call, the value printed to the stdout does not match the first.

This behaviour seems strange. When I replace every usage of EDX in this program with EBX, the outputted integers are identical as expected. I can only infer that EDX is overwritten at some point during the printf function call.

Why is this the case? And how can I make sure that the registers I use in future don't conflict with C lib functions?

  • 2
    That one got me the first time years ago too. The answer you accepted is correct but omits `ebp` and `esp` as callee save. Those two do seem to go without saying, but you can technically mess this up. Welcome to assembly! – sqykly Dec 05 '15 at 02:43
  • @sqykly Thank you. It is certainly a lot less forgiving than the higher level languages which I am used to. But I will not be defeated by it ! :) –  Dec 05 '15 at 02:58
  • Answer as many javascript questions as I do and you will start to wonder about that. – sqykly Dec 05 '15 at 11:31
  • 1
    @sqykly: Fair point that `esp` is also call-preserved. It is possible to get this wrong if you try! I edited that into the accepted answer (since my answer didn't go into the specifics of *this* ABI, just ABIs in general). I left out any mention of EFLAGS, where the condition flags are call-clobbered. IIRC, the ABI requires the [direction flag](https://en.wikipedia.org/wiki/Direction_flag) to be cleared on function entry/exit, so memcpy doesn't have to use `CLD`. I also left out mention of vector regs and FPU control settings (rounding modes, etc.) This is why my answer just links to docs – Peter Cordes Dec 05 '15 at 16:35

2 Answers2

11

According to the x86 ABI, EBX, ESI, EDI, and EBP are callee-save registers and EAX, ECX and EDX are caller-save registers.

It means that functions can freely use and destroy previous values EAX, ECX, and EDX. For that reason, save values of EAX, ECX, EDX before calling functions if you don't want their values to change. It is what "caller-save" mean.

Or better, use other registers for values that you're still going to need after a function call. push/pop of EBX at the start/end of a function is much better than push/pop of EDX inside a loop that makes a function call. When possible, use call-clobbered registers for temporaries that aren't needed after the call. Values that are already in memory, so they don't need to written before being re-read, are also cheaper to spill.


Since EBX, ESI, EDI, and EBP are callee-save registers, functions have to restore the values to the original for any of those they modify, before returning.

ESP is also callee-saved, but you can't mess this up unless you copy the return address somewhere.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
MikeCAT
  • 73,922
  • 11
  • 45
  • 70
  • 2
    `EBP` is also callee-save! – Peter Cordes Dec 05 '15 at 02:14
  • 1
    It's not *that* hard. `ret 8` from a function with no params messes up `esp`. Picture any kind of tail call optimization gone wrong. – sqykly Dec 06 '15 at 03:43
  • Or! Misapplied cdecl or stdcall. – sqykly Dec 06 '15 at 03:47
  • @sqykly: I was going to say "you can't mess this up unless you try", when I put in that last paragraph into Mike's answer. I guess I should have gone with that, instead of the one specific way I thought of. Caller-pops is more efficient, because the caller could just leave the stack pointer where it is and overwrite things to set up for another call, instead of popping and then pushing. `ret imm` is only useful when optimizing for code size, since it's 3 uops on Intel SnB-family microarchitectures. (ret, sync stack engine, modify rsp). If they cared, they could prob. make it 1 uop. – Peter Cordes Dec 06 '15 at 15:37
  • @petercordes it's also nice for people who are too lazy to do that optimization and use stdcall, but not quite too lazy to write assembly. – sqykly Dec 06 '15 at 15:41
  • @sqykly: I mostly meant that I didn't even think of `ret imm` existing. There's no use for it in 64bit mode at all, because the standard ABI is register-call. 32bit should just go die already, and take its legacy ABI baggage with it. – Peter Cordes Dec 06 '15 at 15:51
5

The ABI for the target platform (e.g. 32bit x86 Linux) defines which registers can be used by functions without saving. (i.e., if you want them preserved across a call, you have to do it yourself).

Links to ABI docs for Windows and non-Window, 32 and 64bit, at https://stackoverflow.com/tags/x86/info

Having some registers that aren't preserved across calls (available as scratch registers) means functions can be smaller. Simple functions can often avoid doing any push/pop save/restores. This cuts down on the number of instructions, leading to faster code.

It's important to have some of each: having to spill all state to memory across calls would bloat the code of non-leaf functions, and slow things down esp. in cases where the called function didn't touch all the registers.

See also What are callee and caller saved registers? for more about call-preserved vs. call-clobbered registers in general.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • That last paragraph sounds funny. If you had to save all state to memory, the leaf functions are exactly the ones it would bloat. The non-leaf functions essentially do bloat either way, since they're both a caller and a callee. – Daniel Stevens Dec 05 '15 at 05:52
  • @DanielStevens: the last paragraph is talking about the case where all registers are clobbered, like the xmm regs are in the SysV 64bit ABI. Leaf functions don't have to save anything. Also: non-leaf functions often have enough callee-saved registers to keep a few key pieces of state in regs, and were mostly using the caller-save regs as scratch space to compute the function-call parameters. You only need to save/restore a reg if you still need it after the function call. Typically you need a couple thinks, like a loop counter and a pointer or two, but can reload other stuff. – Peter Cordes Dec 05 '15 at 12:36
  • But you were talking about spilling all state to memory, not letting it get clobbered. – Daniel Stevens Dec 05 '15 at 15:24
  • 1
    @DanielStevens: err yeah, but all-regs=caller-save bloats *non*-leaf functions, while all-regs=callee-save bloats leaf functions. My last paragraph is talking about the first case (all-regs=caller-save). I think you are talking about the all-regs=callee-save case, (which my 2nd-last pagraph is about). – Peter Cordes Dec 05 '15 at 16:12
  • That makes more sense now. Reading it, it sounds like your last two paragraphs were about having scratch registers versus not having them, rather than any difference between who saves non-scratch registers. – Daniel Stevens Dec 06 '15 at 08:26
  • @DanielStevens: reworded to hopefully avoid confusion for others. Thanks. – Peter Cordes Dec 06 '15 at 12:15