I'm investigating the implementation detail of seccomp-bpf, the syscall filtration mechanism that was introduced into Linux since version 3.5. I looked into the source code of kernel/seccomp.c from Linux 3.10 and want to ask some questions about it.
From seccomp.c, it seems that seccomp_run_filters() is called from __secure_computing() to test the syscall called by the current process. But looking into seccomp_run_filters(), the syscall number that is passed as an argument is not used anywhere.
It seems that sk_run_filter() is the implementation of BPF filter machine, but sk_run_filter() is called from seccomp_run_filters() with the first argument (the buffer to run the filter on) NULL.
My question is: how can seccomp_run_filters() filter syscalls without using the argument?
The following is the source code of seccomp_run_filters():
/**
* seccomp_run_filters - evaluates all seccomp filters against @syscall
* @syscall: number of the current system call
*
* Returns valid seccomp BPF response codes.
*/
static u32 seccomp_run_filters(int syscall)
{
struct seccomp_filter *f;
u32 ret = SECCOMP_RET_ALLOW;
/* Ensure unexpected behavior doesn't result in failing open. */
if (WARN_ON(current->seccomp.filter == NULL))
return SECCOMP_RET_KILL;
/*
* All filters in the list are evaluated and the lowest BPF return
* value always takes priority (ignoring the DATA).
*/
for (f = current->seccomp.filter; f; f = f->prev) {
u32 cur_ret = sk_run_filter(NULL, f->insns);
if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION))
ret = cur_ret;
}
return ret;
}