6

I've run into an issue with the Linux futex syscall (FUTEX_WAIT operation) sometimes returning early seemingly without cause. The documentation specifies certain conditions that may cause it to return early (without a FUTEX_WAKE) but these all involve non-zero return values: EAGAIN if the value at the futex address does not match, ETIMEDOUT for timed waits that timeout, EINTR when interrupted by a (non-restarting) signal, etc. But I'm seeing a return value of 0. What, other than FUTEX_WAKE or the termination of a thread whose set_tid_address pointer points to the futex, could cause FUTEX_WAIT to return with a return value of 0?

In case it's useful, the particular futex I was waiting on is the thread tid address (set by the clone syscall with CLONE_CHILD_CLEARTID), and the thread had not terminated. My (apparently incorrect) assumption that the FUTEX_WAIT operation returning 0 could only happen when the thread terminated lead to serious errors in program logic, which I've since fixed by looping and retrying even if it returns 0, but now I'm curious as to why it happened.

Here is a minimal test case:

#define _GNU_SOURCE
#include <sched.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <linux/futex.h>
#include <signal.h>

static char stack[32768];
static int tid;

static int foo(void *p)
{
        syscall(SYS_getpid);
        syscall(SYS_getpid);
        syscall(SYS_exit, 0);
}

int main()
{
        int pid = getpid();
        for (;;) {
                int x = clone(foo, stack+sizeof stack,
                        CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND
                        |CLONE_THREAD|CLONE_SYSVSEM //|CLONE_SETTLS
                        |CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID
                        |CLONE_DETACHED,
                        0, &tid, 0, &tid);
                syscall(SYS_futex, &tid, FUTEX_WAIT, x, 0);
                /* Should fail... */
                syscall(SYS_tgkill, pid, tid, SIGKILL);
        }
}

Let it run for a while, at it should eventually terminate with Killed (SIGKILL), which is only possible if the thread still exists when the FUTEX_WAIT returns.

Before anyone goes assuming this is just the kernel waking the futex before it finishes destroying the thread (which might in fact be happening in my minimal test case here), please note that in my original code, I actually observed userspace code running in the thread well after FUTEX_WAIT returned.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • I think we may need to see a minimal example; it's hard to come up with substantial advice, since so much is unknown (I'll post my one hunch as a temporary answer anyway, because it's to big for a comment) – sehe Sep 11 '11 at 20:57
  • Indeed, I'll see if I can put together a minimal example. – R.. GitHub STOP HELPING ICE Sep 11 '11 at 21:12
  • hm, I think the man page is quite unclear. the conditions under the return value of `FUTEX_WAIT` qualifies the non zero conditions as *error* conditions, not only diagnostics. Then later it says "In the event of an error, all operations return -1, and set errno to indicate the error." On the other hand the conditions here are not repeated in the **ERRORS** section. – Jens Gustedt Sep 11 '11 at 21:25
  • And I just confirmed with `strace` that the "child thread" has not yet called `_exit` when `FUTEX_WAIT` returns. – R.. GitHub STOP HELPING ICE Sep 11 '11 at 23:01
  • It is probably worth asking this on the linux kernel mailing list. – caf Sep 12 '11 at 00:50
  • If you do, please post the answer back here ... I'm curious to know as well ... – Jason Sep 12 '11 at 01:54
  • @R.. Did you ever get any answers on this? – Jason Oct 05 '11 at 20:20
  • The documentation states that EWOULDBLOCK is returned, not EAGAIN. On most systems these have the same numeric value, but not on SPARC. – Display Name Dec 08 '11 at 01:28
  • @Jason: No, I didn't follow up much more.. – R.. GitHub STOP HELPING ICE Dec 08 '11 at 05:30

1 Answers1

0

Could you be dealing with a race condition between whether the parent or child operations complete first? You can probably investigate this theory by putting small sleeps at the beginning of your foo() or immediately after the clone() to determine if a forced sequencing of events masks the issue. I don't recommend fixing anything in that manner, but it could be helpful to investigate. Maybe the futex isn't ready to be waited upon until the child gets further through its initialization, but the parent's clone has enough to return to the caller?

Specifically, the CLONE_VFORK option's presence seems to imply this is a dangerous scenario. You may need a bi-directional signaling mechanism such that the child signals the parent that it has gotten far enough that it is safe to wait for the child.

  • If `tid` had not already been written with the tid value at the time `FUTEX_WAIT` is called, the operation would return with `EAGAIN` rather than 0. (Anyway, the whole point of the `CLONE_PARENT_SETTID` flag to `clone` is to ensure that the value has been written before either thread is able to execute.) I don't see any possibility for a race here in userspace since nothing interesting is happening in userspace... – R.. GitHub STOP HELPING ICE Sep 22 '11 at 16:15