0

I understand that when we want to make a system call, we generally call a C wrapper around that system call which internally puts arguments in the right places (in specific registers or stack sometimes).

After putting arguments (and ignoring approaches like syscall instruction as it not available on all platforms), wrapper generally executes int 0x80 (software interrupt).

Now, when an interrupt is encountered hardware will save all the registers of the running process on its stack and make context switch (which will include switching program counter and stack pointer among other things).

I am having trouble understanding how then system call handler access the arguments that were originally set by C wrapper in specific registers and which now reside in the user process stack (not in kernel process stack)

  • 1
    The thing about `syscall` "not being available on all platforms" isn't really correct. You should use `syscall` in *all* 64-bit programs; it is guaranteed available on all 64-bit systems. Conversely, you must use `int 0x80` in all 32-bit programs. https://stackoverflow.com/questions/46087730/what-happens-if-you-use-the-32-bit-int-0x80-linux-abi-in-64-bit-code – Nate Eldredge May 22 '21 at 22:08
  • 1
    I think you're misled - the user process's registers get saved on the *kernel* stack (either by hardware or by the interrupt handler explicitly pushing them), not on the user stack. It couldn't really work any other way - the user's stack is untrusted. So the higher-level code in the kernel can simply retrieve them from its own stack. – Nate Eldredge May 22 '21 at 22:12
  • Thanks @NateEldredge. By "syscall" not being ubiquitous I meant their absence in 32 bit systems. I had already read the post you linked before posting my question. But it seems my understanding is incorrect if registers are stored on kernel stack. So while system call execution, C wrapper puts args in registers. And after "int 0x80", hardware would save those registers on kernel stack. System call handler, then, pops the stack and restore registers? (Meaning push-pop are useless overheads specifically for executing system calls). "syscall" on the other had does not do all this making it fast? – Rahul Patel May 23 '21 at 03:56
  • 1
    x86-32 does have a mechanism for hardware to save all the registers (hardware task switching) but AFAIK Linux doesn't use it. It does it in software, just executing a bunch of `push` instructions within the `int 0x80` handler. See the `SAVE_ALL` macro defined [here](https://github.com/torvalds/linux/blob/34c5c89890d6295621b6f09b18e7ead9046634bc/arch/x86/entry/entry_32.S#L205) and invoked [here](https://github.com/torvalds/linux/blob/34c5c89890d6295621b6f09b18e7ead9046634bc/arch/x86/entry/entry_32.S#L973). Then it pops them all off again before returning. – Nate Eldredge May 23 '21 at 04:12
  • 1
    The situation with `syscall` is similar, except that the instruction itself overwrites `rcx` and `r11` (to save `rip` and `rflags` instead of needing a stack to push them on). But again it's up to the kernel's handler to manually save and restore all the others. So in short, from userspace's perspective, `int 0x80` only modifies `eax`, and `syscall` only modifies `rax, rcx, r11`; there is no need for userspace to push/pop any other registers. – Nate Eldredge May 23 '21 at 04:15
  • Awesome! Thanks @NateEldredge! I think I understand the flow now. Link to actual assembly code particularly made it clear. Since `ENTRY(entry_INT80_32)` is executed for the `int 0x80` flow, do you happen to have entry function that might be called post `syscall`? Also, could you please point to files for interrupt handlers (other than `0x80` one)? – Rahul Patel May 23 '21 at 06:09
  • Yes, the `syscall` entry is [here](https://github.com/torvalds/linux/blob/4d7620341eda38573a73ab63c33423534fa38eb9/arch/x86/entry/entry_64.S#L87). (Not hard to guess that it's in the file `entry_64.S`.) – Nate Eldredge May 23 '21 at 06:13
  • Other exception handlers are in those same two files. Hardware interrupt handlers, I'm not sure. – Nate Eldredge May 23 '21 at 06:18
  • Yup, I meant hardware interrupts only. Thanks a lot anyway! – Rahul Patel May 23 '21 at 06:38
  • I think the hardware interrupt handlers may also be defined in those files, but it's buried in enough layers of macros that it's hard to be sure. – Nate Eldredge May 23 '21 at 06:51

1 Answers1

2

Now, when an interrupt is encountered hardware will save all the registers of the running process on its stack and make context switch (which will include switching program counter and stack pointer among other things).

Nope ;-) On Linux AMD64, syscall arguments are directly passed in registers (registers do not get stacked).

For details, see "x86-64 Linux System Call convention" http://refspecs.linuxfoundation.org/elf/x86_64-abi-0.99.pdf , Sect. 'A.2.1 Calling Conventions' at the bottom of page 123/head of p. 124.

  • This is also the case for 32-bit x86 Linux; arguments are also passed in registers. However, data that is in the process's memory can easily be accessed by the kernel when needed (think data for `write(2)`, pointers to data to be filled in, etc); the kernel is after all privileged code and can read and write all of memory. It's normally done through the `copy_from_user` kernel function. – Nate Eldredge May 22 '21 at 22:10
  • @NateEldredge Perfectly right! And these data are passed as values of class MEMORY (pointers in registers) to the kernel. – senatores conscripti May 22 '21 at 22:16