Why `events/syscalls/sys_enter` does not support string format as output?

Question

I have a question about events/syscalls/sys_enter* trace point. Why does not events/syscalls/sys_enter* support string format? For example, in case of sys_enter_openat outputs the filename as hex, not string.

$ cd /sys/kernel/debug/tracing
$ cat events/syscalls/sys_enter_openat/format
name: sys_enter_openat
ID: 623
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:int __syscall_nr; offset:8;       size:4; signed:1;
        field:int dfd;  offset:16;      size:8; signed:0;
        field:const char * filename;    offset:24;      size:8; signed:0;
        field:int flags;        offset:32;      size:8; signed:0;
        field:umode_t mode;     offset:40;      size:8; signed:0;

print fmt: "dfd: 0x%08lx, filename: 0x%08lx, flags: 0x%08lx, mode: 0x%08lx", ((unsigned long)(REC->dfd)), ((unsigned long)(REC->filename)), ((unsigned long)(REC->flags)), ((unsigned long)(REC->mode))

I know I can get the filename as string using kprobe, but I'd not know why sys_enter_openat does not use %s for the output format by default as follows.

print fmt: "dfd: 0x%08lx, filename: %s, ...

Is there any restriction the tracer cannot dereference the pointer? (In case of cat events/sched/sched_switch/format, the format uses %s to output the string.)

Related question: Change format of syscall event trace output to ftrace

"Is there any restriction the tracer cannot dereference the pointer?" - Yes, there is some sort of such restriction. When **store** an event in the buffer, the `format` is used: it specifies which fields are stored and of which **size**. That is, it is possible to store a string, but only of a limited length (see example with the field `comm` in https://www.kernel.org/doc/Documentation/trace/events.txt). When **print** an event (using `print fmt`), only stored information can be used. Dereferencing is not possible on this stage, because all processes' contexts are lost. — Tsyvarev, Feb 17 '22 at 20:47
Thank you for your comment. It would be helpful for me. I understand about the condition the tracer can apply string. However, I have an additional question about the description, ` because all processes contexts are lost.` As far as I remember, kernel context can read the user context address safely using like `copy_from_user`, so I think the kernel context can dereference the pointer on the `print fmt` stage. If I can modify the implementation of the `print fmt` process, can I get the string from the memory? I'm sorry if I'm wrong, I'd appreciate it if you could answer. — m-bat, Feb 18 '22 at 00:37
Actually, when I use kprobe, I can get the filename as string, so I'd like to believe that tracepoints can do that as well. — m-bat, Feb 18 '22 at 02:08
Tracepoints and kprobe are **different mechanisms** for capture data about events. Using kprobe, you can perform custom actions at the time when an event is fire. At that time a process context is available and strings could be read. When use tracepoints, triggering an event causes only **storing** the data about the event into the ring buffer. **Printing** these data using `print fmt` occurs **later**, when process context is no longer available. — Tsyvarev, Feb 18 '22 at 07:26
Thank you for your comment. But, I don't know why tracepoints do not store the string data into the ring buffer when an event is fire. At that time, I think tracepoints can dereference the pointer and store the data as string, not address. I think users want to know the actual data than the address (moreover, the both is better). Could you give me your thoughts? — m-bat, Feb 18 '22 at 11:23
"I don't know why tracepoints do not store the string data into the ring buffer when an event is fire." - Tracepoints could store a string into the ring buffer, but this would require much more memory. And the more length of the string they want to store, the more memory will be need for **every** record of that type. Have you checked the example on https://www.kernel.org/doc/Documentation/trace/events.txt, which I have pointed in my first comment? It describes how the things work. — Tsyvarev, Feb 18 '22 at 11:55
"Tracepoints could store a string into the ring buffer, but this would require much more memory. And the more length of the string they want to store, the more memory will be need for every record of that type." - I understand all about clearly now. I have checked the docs, but I wanted to know the reason tracepoints do not store the string into the buffer . I do appreciate your answer! — m-bat, Feb 18 '22 at 12:10

Why `events/syscalls/sys_enter` does not support string format as output?

0 Answers0