11

I've been looking at glibc/nptl's implementation of cancellation points, and comparing it to POSIX, and unless I'm mistaken it's completely wrong. The basic model used is:

int oldtype = LIBC_ASYNC_CANCEL(); /* switch to asynchronous cancellation mode */
int result = INLINE_SYSCALL(...);
LIBC_CANCEL_RESET(oldtype);

According to POSIX:

The side-effects of acting upon a cancellation request while suspended during a call of a function are the same as the side-effects that may be seen in a single-threaded program when a call to a function is interrupted by a signal and the given function returns [EINTR]. Any such side-effects occur before any cancellation cleanup handlers are called.

My reading of this passage is that if I call open, I can expect it either to get cancelled (along with my whole thread) before it fails to open a file, or to return a valid file descriptor or -1 and errno value, but never to create a new file descriptor then lose it into the void. On the other hand, the glibc/nptl implementation of cancellation points seems to allow for a race condition where the cancellation request occurs just after the syscall returns but before LIBC_CANCEL_RESET takes place.

Am I crazy, or is their implementation really this broken? And if so, does POSIX allow such broken behavior (which seems to render cancellation completely unusable unless you defer it manually), or are they just blatantly ignoring POSIX?

If this behavior is in fact broken, what's the correct way to implement it without such a race condition?

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711

2 Answers2

4

Isn't this clarified in the next paragraph of the standard:

However, if the thread is suspended at a cancellation point and the event for which it is waiting occurs before the cancellation request is acted upon, it is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution.

Which implies that this race condition is perfectly legal behaviour.

cmeerw
  • 7,176
  • 33
  • 27
  • 4
    How does one write a robust program with such nonsensical behavior? It seems you'd either have to wrap every call to `open` (and other cancellation points which can allocate resources) with code to disable cancellation, keep cancellation always disabled and manually call `pthread_trycancel` periodically, or come up with some way to search out and free such resources (for file descriptors, it's possible to walk all values, but pretty dangerous to do so in a threaded application without some heavy-duty locks). – R.. GitHub STOP HELPING ICE Nov 18 '10 at 20:11
  • 1
    mmh, I read that as that the thread may not be cancelled just yet, and should be cancelled at the next cancellation point (In which case, cancellation points need OS support, and are probably broken in the manner that R.. says on most OSes). – ninjalj Nov 18 '10 at 21:38
  • @ninjaj: I think cmeerw may be correct. The passage is saying that if (for example) the thread is suspended waiting for `open` to return, but `open` has already finished doing its job, the implementation could allow the cancellation to happen or leave it pending. – R.. GitHub STOP HELPING ICE Nov 19 '10 at 07:17
  • 3
    After much discussion, including requesting and receiving interpretations from the Austin Group (responsible for POSIX), I no longer think this answer is correct. "The event for which it is waiting" is not "opening the file" but whatever event must take place before the file can be opened (e.g. opening the other and of a fifo). "Resuming normal execution" would be opening the file; acting on cancellation would require ensuring that the side effects match the side effects on `EINTR` (i.e. the file not being opened). – R.. GitHub STOP HELPING ICE May 04 '13 at 00:08
  • 1
    For the Austin Group reference, see http://austingroupbugs.net/view.php?id=614 and related issues. – R.. GitHub STOP HELPING ICE May 04 '13 at 00:12
0

This was acknowledged as a bug in glibc and fixed in commit 6fe99352106cf8f244418f3708b3d5928e82e831.

The POSIX text is unambiguous that side effects cannot already have happened in the case of cancellation. The text quoted in cmeerw's answer, that if the

event for which it is waiting occurs before the cancellation request is acted upon, it is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution.

allows an implementation to act on cancellation if the event being waited for (e.g. device to become available, file descriptor to become readable, etc.) has already occurred, but does not allow this if the event has already been consumed or otherwise had some side effect (e.g. opening the device and allocating a file descriptor, consuming data from a pipe or socket, etc.).

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711