3

I'm doing a kernel module that intercepts kernel syscalls. Intercepting, or rather just replacing the real syscall address with a fake syscall address in plain C is as easy as 1-2-3. But I'd like to know how that works on low level.

(let's pretend I'm on x86)

First of all, I'm doing just a basic test: I'm kallocating a small chunk of executable memory and filling it with this opcode:

0xB8, 0x00, 0x00, 0x00, 0x00,          //mov eax, &real_syscall_function;
0xFF, 0xE0,                            //jmp eax;

Inserting the module and replacing the syscall works just perfect.

Now, according to this SO answer, arguments are passed in the registers. I want to check this, so I create an executable chunk of memory and fill it with this code:

0x55,                                  //push ebp;
0x89, 0xE5,                            //mov ebp, esp;
0x83, 0xEC, 0x20,                      //sub esp, 32; 

0xB8, 0x00, 0x00, 0x00, 0x00,          //mov eax, &real_syscall_function;
0xFF, 0xE0,                            //jmp eax;

0x89, 0xEC,                            //mov esp, ebp;
0x5D,                                  //pop ebp;
0xC3                                   //ret;

This should work too, as I'm not touching any of the registers, I'm just playing with the stack, but it doesn't work. That makes me think arguments are actually passed on the stack. But why? Am I understand the SO answer I linked to wrong? Aren't args supposed to be in the registers when a syscall is called?

Extra question: Why using jmp eax works, but call eax doesn't work? (This applies to both first and second example code).

Edit: I'm sorry, I missed a little bit the comments in the ASM code. What I'm jmping to is the address of the real syscall function.

Edit 2: I think it's obvious, but anyways I'll explain it just in case somebody is not understanding what I'm doing. I'm allocating a small executable chunk of memory, filling it with the opcode I'm showing and then making a given syscall (let's say __NR_read) point to the address of that executable chunk of memory.


works just perfect == system keeps running without problems. It means the real syscall is being called from the fake syscall

it doesn't work == system crashes because the fake syscall isn't calling the real syscall

Community
  • 1
  • 1
alexandernst
  • 14,352
  • 22
  • 97
  • 197
  • Define what exactly you call a syscall. In general, they go thru the VDSO (using e.g. `SYSENTER` there). but the C function (e.g. `mmap`) use the stack (and pass arguments to registers). Read the Linux Assembly HowTo and the ABI specification. – Basile Starynkevitch Jan 04 '14 at 15:39
  • @BasileStarynkevitch By ```syscall``` I literaly mean a ```syscall```, the funcitons that are pointed by the syscall table in the linux kernel. Btw, I'm currently trying the code with ```__NR_read```. – alexandernst Jan 04 '14 at 15:42

2 Answers2

1

Syscall params are first passed from userspace via registers to system_call() function which is in essence a common syscall dispatcher. However system_call() then calls real system call functions such as sys_read() in a usual manner, passing parameters via stack. Therefore, messing up with the stack leads to crash. Also, see this SO answer: https://stackoverflow.com/a/10459713 and very detailed explanation on quora: http://www.quora.com/Linux-Kernel/What-does-asmlinkage-mean-in-the-definition-of-system-calls#step=6 (registration required).

Guy Avraham
  • 3,482
  • 3
  • 38
  • 50
  • I'm not really sure this is true for x64. I have another kernel module specifically written for x64 which hooks syscalls (sys_read) and the arguments are passed via registers (and I'm completely sure about that because the kernel module is completely working and has been tested several times). Anyways, going back to my question: Where should I look for the arguments if I'm hooking a syscall from a kernel module? In the stack? – alexandernst Jan 04 '14 at 18:20
  • So... Let's say I want to keep my code (the second one). How could I get the args from the caller and place them in my currect stack? (because I'm moving the stack). – alexandernst Jan 04 '14 at 18:41
  • Deleted my previous comment since the part about printk doesn't really make sense. Just to summarize: arguments are passed via registers to system_call() function but *not* to particular system calls like sys_read(). Therefore, second version of code doesn't make sense: it corrupts the stack and jumps to original system call which results in immediate crash. –  Jan 04 '14 at 19:03
  • Also, good read about hooking: http://poppopret.org/2013/01/07/suterusu-rootkit-inline-kernel-function-hooking-on-x86-and-arm/ (not only system calls but actually almost any kernel function). –  Jan 04 '14 at 19:09
  • This is not quite answering my question though... Look at https://github.com/alexandernst/procmon/blob/master/procmon_kmodule/sct_hook.c#L118 as you see I'm hooking both x86 and x64 syscalls. Not the system_call() but each syscall individually. That means that what I'm trying to do is possible. I'm just not sure *how* exactly is it done. If the args are on the stack, how can I allocate my own stack and then copy the args? If they are not on the stack, where are they? – alexandernst Jan 04 '14 at 19:29
  • I think I answered all the questions in the topic, namely: But why? You are corrupting the stack and then jumping to the function, it naturally crashes. Am I understand the SO answer I linked to wrong? It was about system_call() function but not about particular syscalls. Aren't args supposed to be in the registers when a syscall is called? At the moment it is called from userspace, args are in the registers, at the moment actual syscall function starts execution, they are on the stack. Extra question: Why using jmp eax works, but call eax doesn't work? It modifies the stack. –  Jan 04 '14 at 19:41
  • Not sure what do you mean by allocating another stack and copying args but it seems to be an offtopic to this question. –  Jan 04 '14 at 19:42
  • Ah, I see where are we misunderstanding each other. I was talking about hooking/hijacking a specific syscall, not the system_call() itself. Maybe you didn't see my edits? – alexandernst Jan 04 '14 at 20:00
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/44517/discussion-between-ivan-and-alexandernst) –  Jan 04 '14 at 20:04
  • Please see reply in chat :) – alexandernst Jan 05 '14 at 00:41
1

Adding an example code to user708549 answer's above.

Consider the minimal code and naive C function that gets a file descriptor (int) and reads from this file some amount of bytes (40 in this example).

// funcToCallReadSystemCall.c file's content
#include<unistd.h>

void callReadSystemCall(const int fileDescriptor)
{
    int sz = 0;
    char buff[64] = {0};
    sz = read(fileDescriptor, buff, 40);
}

The assembly code (abbriviated) for this function will look like so:

funcToCallReadSystemCall.c:6:   int sz = 0;
001a C745AC00       movl    $0, -84(%rbp)   #, sz
  000000
funcToCallReadSystemCall.c:7:   char buff[64] = {0};
0021 48C745B0       movq    $0, -80(%rbp)   #, buff
  00000000 
0029 48C745B8       movq    $0, -72(%rbp)   #, buff
  00000000 
0031 48C745C0       movq    $0, -64(%rbp)   #, buff
  00000000 
0039 48C745C8       movq    $0, -56(%rbp)   #, buff
  00000000 
0041 48C745D0       movq    $0, -48(%rbp)   #, buff
  00000000 
0049 48C745D8       movq    $0, -40(%rbp)   #, buff
  00000000 
0051 48C745E0       movq    $0, -32(%rbp)   #, buff
  00000000 
0059 48C745E8       movq    $0, -24(%rbp)   #, buff
  00000000 
funcToCallReadSystemCall.c:8:   sz = read(fileDescriptor, buff, 40);
0061 488D4DB0       leaq    -80(%rbp), %rcx #, tmp88
0065 8B459C         movl    -100(%rbp), %eax    # fileDescriptor, tmp89
0068 BA280000       movl    $40, %edx   #,
  00
006d 4889CE         movq    %rcx, %rsi  # tmp88,
0070 89C7           movl    %eax, %edi  # tmp89,
0072 E8000000       call    read@PLT    #
  00
0077 8945AC         movl    %eax, -84(%rbp) # _1, sz

Notes about the above assembly code:

movl $0, -84(%rbp) #, sz – this line set sz as zero. Note that its location is -84 bytes down the base stack pointer

movq $0, -80(%rbp) #, buff – these lines set the buff buffer of chars to zero. The buff char array starts at location -80 bytes down the base stack pointer and “lasts” for 64 bytes.

leaq -80(%rbp), %rcx #, tmp88 – this line loads the address (location) of the buff pointer into rcx register.

movl -100(%rbp), %eax # fileDescriptor, tmp89 – this line copies the content of the funtion’s argument fileDescriptor into eax register.

movl $40, %edx #, – this line of code copies (moves) the value 40 into the edx register.

movq %rcx, %rsi # tmp88, – this line copies (moves) the contenxt of rcx register into rsi register.

movl %eax, %edi # tmp89, – this line of code copies (moves) the value of eax register into the edi register. Recall, that eax was set before with the value of the file descriptor number.

At this point, there are the following values: edi (rdi): the file descriptor value (integer)

rsi: the pointer to the buffer into which the read will be done

edx (rdx): the amount of bytes to read. –> According to the x86_64 system call calling convention, the six arguments to a system call will be passed in the following registers: rdi, rsi, rdx, r10, r8, r9 in this order.

call read@PLT # – this line has the x86_64 architecture instruction that is used to call the read() function from the Procedure Linkage Table (PLT). The PLT is a table of function stubs that are used to call functions in shared libraries.

movl %eax, -84(%rbp) # _1, sz – this lines copies (moves) the value from the eax register into the sz location. This is due to the fact that on x86_64 architecture system call calling convention, the return value is passed via the eax register.

Note that, the reason the arguments are passed within (dedicated) registers, is cause this trasintion is causing the code to be executed from within user-mode into kernel-mode, and kernel mode stack is NOT “concatanted” to the user-mode stack.

The command I used to generate the above assembly code was:

gcc -g -O0 -c -fverbose-asm -Wa,-adhln funcToCallReadSystemCall.c > funcToCallReadSystemCall.lst

Where funcToCallReadSystemCall.c is the source code that contains the above C function and the output will be written into the funcToCallReadSystemCall.lst (text) file.

Guy Avraham
  • 3,482
  • 3
  • 38
  • 50
  • `call read@PLT` is a call to a user-space function in libc. It follows the standard function-calling conventions, hiding the user/kernel interface if it had been different. (Like for system calls with 4 or more args, where the 4th arg goes in R10 for system calls vs. RCX for function calls). – Peter Cordes Feb 25 '23 at 23:29
  • Passing the first 6 args in registers for all function calls is done because that's more efficient, not from necessity. For example, FreeBSD for 32-bit x86 passes stack args to `int 0x80` system calls on the user-space stack, with the kernel using the saved user-space stack pointer to find the args. Linux makes it simpler for the kernel by passing system call args in registers, but 32-bit x86 Linux does still pass args on the stack for function calls, including the libc `read` wrapper. – Peter Cordes Feb 25 '23 at 23:30
  • BTW, you don't need to initialize `char buff[64] = {0};`. The system call only stores to that memory, it doesn't load the old contents. The `read` return value tells you how many bytes were written. Normally a `read` wrapper would return at least the length of characters you discarded, i.e. return the `read` return value. The buffer and length would usually be a pointer and size_t arg that the caller supplies. Your function that discards all the `read` results before returning is weird. – Peter Cordes Feb 25 '23 at 23:33