3

I'm learning eBPF and I'm playing with it in order to understand it better while following the docs but there's something I don't understand why it's not working...

I have this very simple code that stops the code and returns 5.

int main() {
   exit(5);
   return 0;
}

The exit function from the code above calls the exit_group syscall as can we can see by using strace (image below) yet within my Python code that's using eBPF through bcc the output I get for my bpf_trace_printk is the value 208682672 and not the value 5 that the exit_group syscall is called with as I was expecting...

strace return

from bcc import BPF

def main():
    bpftext = """
    #include <uapi/linux/ptrace.h>

    void my_exit(struct pt_regs *ctx, int status){
        bpf_trace_printk("%d", status);
    }
    """

    bpf = BPF(text=bpftext)
    fname = bpf.get_syscall_fnname('exit_group')
    bpf.attach_kprobe(event=fname, fn_name='my_exit')

    while True:
        print(bpf.trace_fields())


if __name__ == '__main__':
    main()

I've looked into whatever I found online but I couldn't find a solution as I've been investigating this problem for a few days now...

I truly appreciate any help available and thank you!

2 Answers2

1

I am not sure if your probe function should take 3 arguments. They seem to many. In any case, the struct pt_regs *ctx you have should already hold any information you need. You should be able to read any register value through dedicated macros (PT_REGS_xxx) or manually accessing the structure fields.

The first syscall argument can be extracted with PT_REGS_PARM1:

    bpftext = """
    #include <uapi/linux/ptrace.h>

    void my_exit(struct pt_regs *ctx){
        bpf_trace_printk("%ld\\n", PT_REGS_PARM1(ctx));
    }
    """
Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128
  • Hi, thank you for your time, while copy & pasting my code I didn't notice the extra argument, it was meant to be only 2 arguments (it's fixed now). Using your eBPF code I still get the same return which is `(b'my_c_app', 74837, 0, b'd..31', 10002.827309, b'208387760')` – somerandomdrunkcoder May 25 '23 at 08:23
  • @somerandomdrunkcoder So both `PT_REGS_PARM1(ctx)` and `status` have the same wrong value when you print them with `bpf_trace_printk()`? – Marco Bonelli May 25 '23 at 12:44
  • yes exactly... replacing the previous eBPF code with the following ```void my_exit(struct pt_regs *ctx, int status){ bpf_trace_printk("%d", PT_REGS_PARM1(ctx)); }``` returns `(b'my_c_app', 75657, 0, b'd..31', 10876.842539, b'209174192 - 209174192')`. The values that get printed keep changing every time my C app is ran... @Marco Bonelli – somerandomdrunkcoder May 25 '23 at 15:00
1

Fix

You need to rename your function from my_exit to syscall__exit_group.

Why does this matter? BPF programs named in this way get special handling from BCC. Here's what the documentation says:

8. system call tracepoints

Syntax: syscall__SYSCALLNAME

syscall__ is a special prefix that creates a kprobe for the system call name provided as the remainder. You can use it by declaring a normal C function, then using the Python BPF.get_syscall_fnname(SYSCALLNAME) and BPF.attach_kprobe() to associate it.

Arguments are specified on the function declaration: syscall__SYSCALLNAME(struct pt_regs *ctx, [, argument1 ...]).

For example:

int syscall__execve(struct pt_regs *ctx,
    const char __user *filename,
    const char __user *const __user *__argv,
    const char __user *const __user *__envp)
{
    [...]
}

This instruments the execve system call.

Source.

Corrected Code

from bcc import BPF

def main():
    bpftext = """
    #include <uapi/linux/ptrace.h>

    void syscall__exit_group(struct pt_regs *ctx, int status){
        bpf_trace_printk("%d", status);
    }
    """

    bpf = BPF(text=bpftext)
    fname = bpf.get_syscall_fnname('exit_group')
    bpf.attach_kprobe(event=fname, fn_name='syscall__exit_group')

    while True:
        print(bpf.trace_fields())


if __name__ == '__main__':
    main()

Output from the sample program exiting:

(b'<...>', 14896, 0, b'd...1', 3996.079261, b'5')

How it Works

After BCC transforms your BPF program, this results in a slightly different interpretation of the arguments passed. You can use bpf = BPF(text=bpftext, debug=bcc.DEBUG_PREPROCESSOR) to see how your code is transformed.

Here's what happens without the syscall__ prefix:

void my_exit(struct pt_regs *ctx){
 int status = ctx->di;
        ({ char _fmt[] = "%d"; bpf_trace_printk_(_fmt, sizeof(_fmt), status); });
    }

This reads in the RDI register and interprets it as the syscall argument.

On the other hand, here's what happens if it's named syscall__exit_group:

void syscall__exit_group(struct pt_regs *ctx){
#if defined(CONFIG_ARCH_HAS_SYSCALL_WRAPPER) && !defined(__s390x__)
 struct pt_regs * __ctx = ctx->di;
 int status; bpf_probe_read(&status, sizeof(status), &__ctx->di);
#else
 int status = ctx->di;
#endif

        ({ char _fmt[] = "%d"; bpf_trace_printk_(_fmt, sizeof(_fmt), status); });
    }

If the CONFIG_ARCH_HAS_SYSCALL_WRAPPER is defined (it is on x86_64) then the RDI register is interpreted as a pointer to a struct pt_regs, which looks up the RDI register in that, which is the first argument to exit_group().

On systems without syscall wrappers, this does the same thing as the previous example.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66
  • Thanks, it works! So if I understood correctly, to create a normal kprobe, the `kprobe__` prefix is unnecessary if we use attach_kprobe, but the `syscall__` prefix is required to create a syscall kprobe right? – somerandomdrunkcoder May 27 '23 at 10:33
  • I can get the syscalls through the `strace` command or through the [syscalls man pages](https://man7.org/linux/man-pages/man2/syscalls.2.html)... Can you point me to a way to find out how I can find the available non-syscalls kprobes without having to look into the [Linux source code](https://github.com/torvalds/linux/blob/master/arch/x86/kernel/kprobes/core.c)? Thank you a lot! – somerandomdrunkcoder May 27 '23 at 10:43
  • @somerandomdrunkcoder The command `sudo bpftrace -l '*'` can be used to list every kprobe and tracepoint the system has. – Nick ODell May 27 '23 at 19:49
  • @somerandomdrunkcoder Right. – Nick ODell May 27 '23 at 20:22