I am creating a syscall tracer using seccomp. I don't change anything in the system call, I just log it in my structure and when the process finishes - I dump this structure on a disk.
When I run my program like this (it's called tracer):
tracer env
Everything works well, and I see the logs in the file after.
However, if I try to trace a program which calls execve
inside, it fails:
tracer watch -n1 env
or
tracer strace -o /tmp/log env
fails with the stdout
env: error while loading shared libraries: cannot create cache for search path: Cannot allocate memory
and the log:
$ cat /tmp/log
execve("/usr/bin/env", ["env"], [/* 19 vars */]) = 0
brk(NULL) = 0x415000
mmap(0xffffffffffffffda, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2
writev(103, [{iov_base="env", iov_len=3}, {iov_base=": ", iov_len=2}, {iov_base="error while loading shared libraries", iov_len=36}, {iov_base=": ", iov_len=2}, {iov_base="", iov_len=0}, {iov_base="", iov_len=0}, {iov_base="cannot create cache for search path", iov_len=35}, {iov_base=": ", iov_len=2}, {iov_base="Cannot allocate memory", iov_len=22}, {iov_base="\n", iov_len=1}], 10) = 127
+++ exited with 127 +++
Notice the weird mmap
address and its return value. I don't understand what is wrong and why does this happen. Any other program works fine, so I guess the problem is with copying seccomp
filters to the forked process which calls execve
.
Here are my seccomp
rules:
struct sock_filter filter[] = {
BPF_STMT(BPF_LD + BPF_W + BPF_ABS, offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_openat, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_write, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_mmap, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_mprotect, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_close, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),
};
I don't list the whole code as it is obvious and can be only written in a single way, also, it is written in the article I referred to above. The problem is also known in the Internet but I was not able to find any solution. If you still insist on the whole code (I doubt that) or MCVE, I can provide it.
Also, when I add the execve
trace I have different behavior:
struct sock_filter filter[] = {
BPF_STMT(BPF_LD + BPF_W + BPF_ABS, offsetof(struct seccomp_data, nr)),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_openat, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_write, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_mmap, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_mprotect, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_close, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, __NR_execve, 0, 1),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_TRACE),
BPF_STMT(BPF_RET + BPF_K, SECCOMP_RET_ALLOW),
};
The log becomes:
$ cat /tmp/log
execve(0xffffffffffffffda, ["env"], [/* 19 vars */]) = -1 ENOSYS (Function not implemented)
getpid() = 15535
exit_group(1) = ?
+++ exited with 1 +++
Linux 4.4 aarch64, Linux 4.15 x86-64
The more time I spend on this problem, the more I realize that the problem is actually in the kernel's source code. It copies the filters from one process to another, child one but they don't copy the implementation, and so all of the SECCOMP_RET_TRACE
rules are copied and there is no tracer in the child, so every system call in the subchild returns -ENOSYS
as there is no tracer there, however, the rules are copied.