3

Currently, I'm getting into the topic of kernel tracing with LTTng and Perf. I'm especially interested to trace the different states a process is in.

I stumbled over the event sched_process_free and sched_process_exit. I'm wondering if my current understanding is correct:

If a process is exited, sched_process_exit is written to the trace. However, the process descriptor might still be in the memory which leads to a zombie. When the whole memory connected to the process is freed, sched_process_free is called. This would mean, if I really want to be sure that the process is fully "terminated" and removed from memory, I have to listen to sched_process_free instead of sched_process_exit in the trace. Is this correct?

juergen_p
  • 58
  • 5
  • I'm a little confused why there is no reply. If you have any problem of the answer, please tell me so that we can improve it. If it's helpful to you, can you please give me a useful upvote or acception? That's important for me. – tyChen Jan 11 '21 at 02:04

1 Answers1

1

I find some time to edit my answer to make it more clear. If there are still some problem, please tell me, we can discuss and make it more clear. Let's dive into the end of task :

there are two system calls : exit_group() and exit(), and all of them will go to do_exit(), which will do the following things.

  • set PF_EXTING which means the task is deleting
  • remove the task descriptor from timer by del_timer_sync()
  • call exit_mm(), exit_sem(), __exit_fs() and others to release structure of that task
  • call perf_event_exit_task(tsk);
  • decrease the ref count
  • set exit_code to _exit()/exit_group() or error
  • call exit_notify()
    • update relationship with parent and child
    • check exit_signal, send SIGCHLD
    • if task is not traced or return value is -1, set the exit_state to EXIT_DEAD, call release_task() to recycle other memory and decrease ref count.
    • if task is traced, set exit_state to EXIT_ZOMBIE
    • set task flag to PF_DEAD
  • call schedule()

We need zombie state cause the parent may need to use those file descriptors so we can not delete all the things in the first time. The parent task will need to use something like wait() to check if child is dead. After wait(), it is time for the zombie to release totally by release_task()

  • decrease the owners' task number
  • if the task is traced, delete from the ptrace_children list
  • call __exit_signal() delete all pending signals and release signal_struct descriptor and exit_itimers() delete all the timer
  • call __exit_sighand() delete signal handler
  • call __unhash_process()
    • nr_threads--
    • call detach_pid() to delete task descriptor from PIDTYPE_PID and PIDTYPE_TGID
    • call REMOVE_LINKS to delete the task from list
  • call sched_exit() to schedule parent's time pieces
  • call put_task-struct() to decrease the counter, and release memory & task descriptor
  • call delayed_put_task_struct()

So, we know that sched_process_exit state will be make in the do_exit(), but we can not make sure if the process is released or not (may call release_task() or not, which will trigger sched_process_free). That is why we need both of the two perf event point.

tyChen
  • 1,404
  • 8
  • 27
  • Hi @tyChen, sorry for my late reply and thx for your answer. I have to confess that I currently do not get in which way this supports my thoughts? What am I missing here, can you point out the exact reason. – juergen_p Jan 11 '21 at 07:10
  • What do you think about [this code section](https://code.woboq.org/linux/linux/kernel/exit.c.html). We can find there the `release_task` with the call to `delayed_put_task_struct`, where we can find `trace_sched_process_free`? – juergen_p Jan 11 '21 at 07:40
  • Hi @tyChen, it seems like your last edit completes now my picture. I will dive through it in more detail, but looks already promising. Afterwards I will resolve this topic. Thx, for you effort already. – juergen_p Jan 12 '21 at 16:36