12

I'm sorry that I can't post code to reproduce this. My problem is precisely that I don't know how to go about debugging this issue.

I am using ptrace with PTRACE_O_TRACEFORK | PTRACE_O_TRACEEXEC | PTRACE_O_TRACEVFORK | PTRACE_O_TRACEVFORKDONE | PTRACE_O_TRACECLONE to trace a process and it's children (and the children's children). The mechanism is much like strace, but with slightly different purposes, as I'm just tracking files that are read or modified.

My code (written in C) works fine on Debian wheezy and Debian jessie on the x86-64 architecture (and also less-tested on i386). When I try to compile and run on an Ubuntu Precise x86-64 virtual machine (which uses a 3.2.0 kernel), I run into trouble.

On the Precise machine, I sometimes find that I do not receive a PTRACE_EVENT_VFORK immediately after a vfork call happens, but instead start receiving events (a couple of SIGSTOP events, and a few system calls) without ever getting the PTRACE_EVENT_VFORK event. I don't see anything suspicious in the system calls being performed, and the behavior is not predictable.

I'm not sure what to try to reduce this to a minimal error case, and I really have no idea as to what might be going wrong, having never before seen this behavior of missing events. It is conceivable that the difference is not the kernel, but rather the build tools that I am tracing (which is a combination of python + gcc).

Any suggestions?

David Roundy
  • 1,706
  • 2
  • 14
  • 20
  • If nobody here can help, try asking on the linux-kernel mailing list. (Unlikely to help, but worth a shot.) As an alternative to `ptrace`, you could use the [LD_PRELOAD trick](http://stackoverflow.com/q/426230/) to intercept calls to `open`, `read`, `write`, and `close`. And good luck; this sounds nasty. – Nemo May 02 '15 at 01:10
  • 1
    I've avoided LD_PRELOAD, since I want my code to be able to trace statically linked binaries. And frankly, I'm scared of linux-kernel! lol :) – David Roundy May 02 '15 at 01:11
  • I agree LD_PRELOAD is not a sane/valid way to do this. Unfortunately I don't know the cause of the vfork trace failures. If you can use the seccomp tracing mode rather than the legacy ptrace style it might be less error-prone and more portable, though. – R.. GitHub STOP HELPING ICE May 02 '15 at 01:15
  • I'm not so familiar with seccomp tracing. I've mostly heard of that in terms of the BPF version, which seems too new for being portable. Is there an older and more stable version, which would go back to kernel 3.2? – David Roundy May 02 '15 at 01:17
  • Statically-linked binaries are not very common... The glibc folks hate static linking and seem to enjoy breaking it any way they can. I am just suggesting an alternative to tracking down what sounds like a tricky kernel bug. You will almost certainly want a reproducible test case for the latter... Have you tried simple things like calling vfork() repeatedly in a loop? – Nemo May 02 '15 at 01:19
  • 1
    @Nemo, regarding statically linked binaries, anything compiled with go is the major counter example, which is an increasingly large hole. I'll try your suggestion when I get a chance. – David Roundy May 02 '15 at 01:23
  • 1
    @DavidRoundy: A ptrace race condition was [fixed in 3.4](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b72c186999e689cb0b055ab1c7b3cd8fffbeb5ed), but I'm unsure if it could cause what you are seeing, however. I can think of one other possibility if and only if your process is multi-threaded, namely two different tasks causing the same process to be stopped, with the vfork one losing the race (STOP/CONT signals do not nest or queue). – Nominal Animal Sep 18 '15 at 16:23
  • Furthermore, the kernel git log regarding [ptrace](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?qt=grep&q=ptrace) does not show any likely suspects, nor does a [ptrace vfork search](http://marc.info/?l=linux-kernel&w=4&r=1&s=ptrace+vfork&q=b) of the linux-kernel mailing list. – Nominal Animal Sep 18 '15 at 16:25

2 Answers2

5

I was working on something similar recently. I suspect you've solved your problem long ago or gave up, but let's write an answer here for posterity.

The various events you register with PTRACE_SETOPTIONS generate messages different from the normal ptrace events. But the normal events are still generated. One normal event is that a newly forked process starts stopped and has to be continued from the tracer.

This means that if you have registered events you watch with PTRACE_O_TRACEFORK (or VFORK) waitpid will trigger twice for the same process after a fork.

One will be with a status that is:

WIFSTOPPED(status) && (WSTOPSIG(status) & 0xff == SIGSTOP)

The other one will be with:

WIFSTOPPED(status) && (WSTOPSIG(status) & 0xff == 0) &&
    ((status >> 16) == PTRACE_EVENT_FORK) /* or VFORK */

There does not seem to be any guarantee from the kernel in which order they will arrive. I found it being close to 50/50 on my system.

To handle this my code looks something like this:

static void
proc_register(struct magic *pwi, pid_t pid, bool fork) {
    /*
     * When a new process starts two things happen:
     *  - We get a wait with STOPPED, SIGTRAP, PTRACE_EVENT_{CLONE,FORK,VFORK}
     *  - We get a wait with STOPPED, SIGSTOP
     *
     * Those can come in any order, so to get the proc in the right
     * state this function should be called twice on every new proc. If
     * it's called with fork first, we set the state to NEW_FORKED, if
     * it's called with STOP first, we set NEW_STOPPED. Then when the
     * other call comes, we set the state to TRACED and continue the
     * process.
     */
    if ((p = find_proc(pwi, pid)) == NULL) {
            p = calloc(1, sizeof(*p));
            p->pid = pid;
            TAILQ_INSERT_TAIL(&pwi->procs, p, list);
            if (fork) {
                    p->state = NEW_FORKED;
            } else {
                    p->state = NEW_STOPPED;
            }
    } else {
            assert((fork && p->state == NEW_STOPPED) || (!fork && p->state == NEW_FORKED));
            p->state = TRACED;
            int flags = PTRACE_O_TRACEEXEC|PTRACE_O_TRACEEXIT|PTRACE_O_TRACEFORK|PTRACE_O_TRACEVFORK;

            if (ptrace(PTRACE_SETOPTIONS, pid, NULL, flags))
                    err(1, "ptrace(SETOPTIONS, %d)", pid);
            if (ptrace(PTRACE_CONT, pid, NULL, signal) == -1)
                    err(1, "ptrace(CONT, %d, %d)", pid, signal);
    }
}
[...]
    pid = waitpid(-1, &status, __WALL);
    if (WIFSTOPPED(status) && (WSTOPSIG(status) & 0xff == SIGSTOP)) {
            proc_register(magic, pid, false);
    } else if (WIFSTOPPED(status) && (WSTOPSIG(status) & 0xff == 0) && ((status >> 16) == PTRACE_EVENT_FORK)) {
            proc_register(magic, pid, true);
    } else {
            /* ... */
    }

The key to making this work was to not send PTRACE_CONT until we receive both events. When figuring out how this works I was sending PTRACE_CONT way too much and the kernel happily accepted them which sometimes even led to my processes exiting long before PTRACE_EVENT_FORK arrived. This made it quite hard to debug.

N.B. I haven't found any documentation about this or anything saying that this is the way it should be. I just found out that this makes things work as things are today. YMMV.

Art
  • 19,807
  • 1
  • 34
  • 60
2

I've bumped into this page several times (for different reasons). If the tracer traces lots of tracees, and there're lots of events (such as when SECCOMP is set to RET_TRACE). waitpid(-1, ...) maybe not the best thing to wait for any tracees, because there could be a lot of tracees changing states, especially in SMP systems (who else is still running UP system), which means there could be tons of events arriving in very short amount of time, and OP was right the events can be out-of-order: some event or signal can happen even before PTRACE_EVENT_FORK.

However, this is not the case (no out-of-order events) when tracer calls waitpid(specific_pid_greater_than_zero,...): we wait a specific tracee ONLY. Granted you program model might not looks as elegant/simple, you may even need to track tracee states (blocking or not), and decides when/which tracee to continue (PTRACE_CONT), but with the bonus not to worry about the hacky ways to handle out-of-order events (also hardly to get it right).

wbj
  • 21
  • 1
  • 3