Why do BSD systems need to sub esp,4 when performing a system call?

Question

I'm performing a system call on OS X (32bit) like this:

push 123
mov eax, 1
sub esp, 4
int 0x80

And I don't quite understand that sub esp, 4 gap.

I read somewhere that BSD and its derivatives always have this gap, but couldn't find an explanation why.

My first thought was stack alignment, but that is not the case, since that line is to be found everywhere, and as far as I know OS X requires 16-byte stack alignment (which isn't the case here either).

Do you have any idea what hides behind the need to do sub esp, 4 or could point me to resources that describe it properly?

The reasoning is discussed here: https://www.freebsd.org/doc/en/books/developers-handbook/book.html#x86-system-calls — Michael Petch, Oct 31 '16 at 23:01
It's because you are expected to be calling a function, and that is the space for the return address. Let me find the duplicate. — Jester, Oct 31 '16 at 23:01
@MichaelPetch I read the very same document, but the fact that it's a replacement for `call` doesn't quite explain why that `call` is needed in the first place — mewa, Oct 31 '16 at 23:20
@Jester That's not true, since that would imply `sub esp, 4` does the same, which it doesn't. It's just a random value — mewa, Oct 31 '16 at 23:23
The call was required in the first place because of a design decision for the reason given in that article. They (kernel developers) always intended for int 0x80 to be called via a function so they had to account for the extra return address appearing before the parameters. It just so happens when writing in assembler spitting out *CALL/RET* was just extra noise, so assembler developers push a value (doesn't matter what value) on the stack to act as a placeholder for the return address and then do `int 0x80`. — Michael Petch, Oct 31 '16 at 23:26
@MichaelPetch I fully understand the consequences of this, I just don't quite get the idea of assuming such a thing. Was this assumption ever used to do something more productive than just offsetting the stack by a constant value (which seems useless)? Apart from x86 perhaps? If you had a quotation on the reasoning behind this design decision I would gladly accept that as an answer — mewa, Oct 31 '16 at 23:35
The reason is to hide the fact that this is a special system call. To the application, it should look like a normal function call. Also, the implementation might use `int 0x80` or `syscall`/`sysenter` as appropriate. To avoid having to copy the arguments, the operating system accounts for the return address. If you don't use a `call`, you have to fake this return address. — Jester, Oct 31 '16 at 23:58
So another way to put this: to make libc wrapper functions for system calls more efficient, because they can just do the `int 0x80` without copying args around. It's standard in Unix/Linux system for system calls like `read(2)` to actually be library wrapper functions around the kernel call, rather than macros that expand to inline-asm. — Peter Cordes, Nov 01 '16 at 00:50
@PeterCordes Oh okay, I get it now, thanks for emphasizing it! I'm sorry (to the previous guys) but until I read your comment this whole idea just eluded me ;) — mewa, Nov 01 '16 at 01:02

score 3 · Accepted Answer · edited May 23 '17 at 12:33

(community wiki because I'm just summarizing comments)

BSD does this to make libc wrapper functions for system calls more efficient, because they can just do the int 0x80 without copying args around. It leaves room for the return address pushed by the CALL to the wrapper function.

It's standard in Unix/Linux system for system calls like read(2) to actually be library wrapper functions around the kernel call, rather than macros that expand to inline-asm.

Linux solves this problem a different way: by passing all syscall args in registers. I guess that means 32-bit wrapper functions have to load all the args from the stack, but at least they don't have to be stored and re-read by the kernel.

The x86-64 system-call ABI is much more compatible with the function calling convention: Only a single mov r10, rcx is needed, because the System V function calling convention passes args in registers (and the syscall registers are chosen to match it as closely as possible, except that the SYSCALL instruction itself destroys RCX and R11, so the kernel can't see the original values.)

See the x86 tag wiki for more info about what the calling conventions actually are, and links to the ABIs.

I'm aware of the other calling conventions, I just needed that 1 pointer gap space explained ;) Thanks for the links anyway! — mewa, Nov 01 '16 at 12:59
@mewa: I'm hopeful some future readers other than you will benefit at some point. :) — Peter Cordes, Nov 01 '16 at 13:07

Why do BSD systems need to sub esp,4 when performing a system call?

1 Answers1