Race condition in glibc/NPTL/Linux robust mutexes?

Question

In a comment on the question Automatically release mutex on crashes in Unix back in 2010, jilles claimed:

glibc's robust mutexes are so fast because glibc takes dangerous shortcuts. There is no guarantee that the mutex still exists when the kernel marks it as "will cause EOWNERDEAD". If the mutex was destroyed and the memory replaced by a memory mapped file that happens to contain the last owning thread's ID at the right place and the last owning thread terminates just after writing the lock word (but before fully removing the mutex from its list of owned mutexes), the file is corrupted. Solaris and will-be-FreeBSD9 robust mutexes are slower because they do not want to take this risk.

I can't make any sense of the claim, since destroying a mutex is not legal unless it's unlocked (and thus not in any thread's robust list). I also can't find any references searching for such a bug/issue. Was the claim simply erroneous?

The reason I ask and that I'm interested is that this is relevant to the correctness of my own implementation built upon the same Linux robust-mutex primitive.

It seems the old VMA based approach had some issues atleast; http://www.kernel.org/doc/Documentation/robust-futexes.txt . However, if I read it correctly the list is maintained in user space memory - so what are you to do if that memory is corrupt ? Though that can perhaps just be viewed as a special case of corrupting the shared memory. — nos, Aug 17 '12 at 11:37
Yes, I see that the list or even the mutex contents could be corrupted if the process runs amok and clobbers them. Is this the issue being described? I'm not worried about ensuring proper behavior when a process with access to the mutex has invoked undefined behavior; I'm just concerned about the possibility of some race condition in otherwise-well-defined use of the robust mutex. — R.. GitHub STOP HELPING ICE, Aug 17 '12 at 12:44
I think I figured it out on my own, but I'd be happy to award the bounty to anybody interested in providing more on the topic. — R.. GitHub STOP HELPING ICE, Aug 17 '12 at 18:40

score 8 · Answer 1 · edited May 23 '17 at 11:50

I think I found the race, and it is indeed very ugly. It goes like this:

Thread A has held the robust mutex and unlocks it. The basic procedure is:

Put it in the "pending" slot of the thread's robust list header.
Remove it from the linked list of robust mutexes held by the current thread.
Unlock the mutex.
Clear the "pending" slot of the thread's robust list header.

The problem is that between steps 3 and 4, another thread in the same process could obtain the mutex, then unlock it, and (rightly) believing itself to be the final user of the mutex, destroy and free/munmap it. After that, if any thread in the process creates a shared mapping of a file, device, or shared memory and it happens to get assigned the same address, and the value at that location happens to match the pid of the thread that's still between steps 3 and 4 of unlocking, you have a situation whereby, if the process is killed, the kernel will corrupt the mapped file by setting the high bit of a 32-bit integer it thinks is the mutex owner id.

The solution is to hold a global lock on mmap/munmap between steps 2 and 4 above, exactly the same as in my solution to the barrier issue described in my answer to this question:

Can a correct fail-safe process-shared barrier be implemented on Linux?

@r Do I understand it right that you can only work around this problem when you're writing your own mutex implementation and have control over `mmap`/`munmap` calls (so that you can insert the locks between step 2 and 4)? That means POSIX (pthreads) robust mutexes will always suffer from this problem? Should that be considered a pthreads bug on Linux? Also, what do you do when the holder of the (robust or non-robust?) global mmap lock fails? — nh2, Nov 26 '14 at 23:35
Yes it's a bug in glibc. See https://sourceware.org/bugzilla/show_bug.cgi?id=14485 — R.. GitHub STOP HELPING ICE, Nov 30 '14 at 00:46

score 6 · Accepted Answer · answered Aug 17 '12 at 22:45

6

The description of the race by FreeBSD pthread developer David Xu: http://lists.freebsd.org/pipermail/svn-src-user/2010-November/003668.html

I don't think the munmap/mmap cycle is strictly required for the race. The piece of shared memory might be put to a different use as well. This is uncommon but valid.

As also mentioned in that message, more "fun" occurs if threads with different privilege access a common robust mutex. Because the node for the list of owned robust mutexes is in the mutex itself, a thread with low privilege may corrupt a high privilege thread's list. This could be exploited easily to make the high privilege thread crash and in rare cases this might allow the high privilege thread's memory to be corrupted. Apparently Linux's robust mutexes are only designed for use by threads with the same privileges. This could have been avoided easily by making the robust list an array fully in the thread's memory instead of a linked list.

answered Aug 17 '12 at 22:45

jilles

10,509
2
26
39

1

Thanks for catching the issue about putting the memory to a different use without unmapping it. This should be avoidable by putting some kind of synchronization in `pthread_mutex_destroy`... – R.. GitHub STOP HELPING ICE Aug 17 '12 at 23:06
Actually, I question whether it is possible to avoid that issue. Within the same process it can definitely be avoided, but between multiple processes, there's no obvious way to "wait for any process still may have a stale pending pointer to the mutex to drop it". I don't think this issue is quite as severe as the "new mapping" issue (since it only affects a very special usage case, reusing the same shared memory for a different purpose), but it's still quite troubling. – R.. GitHub STOP HELPING ICE Aug 17 '12 at 23:13
1

Regarding your final sentence, the "easy" solution using an array instead of a linked list is not actually easy. You don't have storage for an array. The storage requirement grows linearly with the number of robust mutexes the thread holds, and the only way to get that storage without additional allocation (which could fail) at lock time is to put it in the mutexes themselves. I agree it's unfortunate, but there seems to be no other way. Most kernel-space solutions also require allocation at lock time, which can fail, rendering the lock rather useless... – R.. GitHub STOP HELPING ICE Aug 17 '12 at 23:19
1

To avoid spending too many resources in the kernel, Linux limits the length of the linked list anyway. So if you lock too many robust mutexes, everything works except the unlock when the process terminates. It seems to make more sense to fail `pthread_mutex_lock` in that case. Also, given that the return value of `pthread_mutex_lock` on a robust mutex must already be checked (for [EOWNERDEAD] and [ENOTRECOVERABLE]), I think it is acceptable for this operation to fail because of lack of memory or the like. – jilles Aug 18 '12 at 00:09
The limit is something huge like 1 million, last I checked. Anyway, their reason is not to limit resources but to avoid loops in the linked list, which is a silly reason since they could just clear the next pointers as they walk the linked list, and thereby eliminate the possibility of loops. – R.. GitHub STOP HELPING ICE Aug 18 '12 at 00:14
2

Also, assuming your program can recover the state of the protected object, `EOWNERDEAD` is a condition you can handle. `ENOMEM` or similar is not a condition you can handle; there's no way the program can proceed if it cannot obtain the lock or block waiting for it. Keep in mind `pthread_mutex_lock` could be called from a cancellation handler where the mutex needs to be locked as an invariant for the next-level-up cancellation handler... – R.. GitHub STOP HELPING ICE Aug 18 '12 at 00:15
1

Accepted, and I'll probably award the bounty to this answer too. I think I did a better job explaining the race, but you dug up the reference you were originally talking about which is what I was looking for, and also found a worse variant of the race (not involving unmapping) than the one I realized. – R.. GitHub STOP HELPING ICE Aug 22 '12 at 06:34

Race condition in glibc/NPTL/Linux robust mutexes?

2 Answers2

Linked