10

Assuming I have opened dev/poll as mDevPoll, is it safe for me to call code like this

struct pollfd tmp_pfd;
tmp_pfd.fd = fd;
tmp_pfd.events = POLLIN;

// Write pollfd to /dev/poll
write(mDevPoll, &tmp_pfd, sizeof(struct pollfd));

...simultaneously from multiple threads, or do I need to add my own synchronisation primitive around mDevPoll?

Wad
  • 1,454
  • 1
  • 16
  • 33
  • I am pretty sure that you will need to do your own synchronization. My experience with I/O with multi-threading applications is that I/O is not thread safe. http://stackoverflow.com/questions/19974548/are-functions-in-the-c-standard-library-thread-safe has some discussion about thread safe and the C Standard Library though it does not discuss `write()` specifically,.Then there is this article, write(), thread safety, and POSIX https://lwn.net/Articles/180387/ – Richard Chambers Feb 24 '17 at 15:34
  • 2
    `write()` is threadsafe. However, whether concurrent `write()` from multiple threads to the same file-descriptor is atomic is another question. `pwrite()` to separate parts of the file presumably is, and `writev()` is, provided the write is not too large. – EOF Feb 24 '17 at 15:39
  • 4
    `write()` is in the list by POSIX of functions that if "two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them." http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07 – Richard Chambers Feb 24 '17 at 15:42
  • @RichardChambers only on regular files... – Jean-Baptiste Yunès Feb 24 '17 at 15:44
  • And it may also depend on whether your compiler is C11 compliant or not. See the C11 related answer concerning streams and thread safety here http://stackoverflow.com/questions/467938/stdout-thread-safe-in-c-on-linux – Richard Chambers Feb 24 '17 at 15:53
  • Keep in mind that it is faster and more scalable to do non-blocking I/O using one thread instead of multiple threads ([more info](http://stackoverflow.com/questions/8546273/is-non-blocking-i-o-really-faster-than-multi-threaded-blocking-i-o-how)). – rustyx Feb 24 '17 at 15:59
  • @RustyX In this case, though, the question is about writing to Solaris `/dev/poll` - a polling feature used for massive scalability in multithreaded programs. Any extraneous locking in this case is something that likely needs to be avoided. – Andrew Henle Feb 24 '17 at 16:20
  • 2
    After reviewing the `/dev/poll` device driver source code, I'm comfortable declaring that a `write()` call to `/dev/poll` device is thread-safe. There doesn't appear to be any way for the code to return a partial operation. – Andrew Henle Feb 25 '17 at 01:29

2 Answers2

13

Solaris 10 claims to be POSIX compliant. The write() function is not among the handful of system interfaces that POSIX permits to be non-thread-safe, so we can conclude that that on Solaris 10, it is safe in a general sense to call write() simultaneously from two or more threads.

POSIX also designates write() among those functions whose effects are atomic relative to each other when they operate on regular files or symbolic links. Specifically, it says that

If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them.

If your writes were directed to a regular file then that would be sufficient to conclude that your proposed multi-thread actions are safe, in the sense that they would not interfere with one another, and the data written in one call would not be commingled with that written by a different call in any thread. Unfortunately, /dev/poll is not a regular file, so that does not apply directly to you.

You should also be aware that write() is not in general required to transfer the full number of bytes specified in a single call. For general purposes, one must therefore be prepared to transfer the desired bytes over multiple calls, by using a loop. Solaris may provide applicable guarantees beyond those expressed by POSIX, perhaps specific to the destination device, but absent such guarantees it is conceivable that one of your threads performs a partial write, and the next write is performed by a different thread. That very likely would not produce the results you want or expect.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 1
    I'll add what I noted in my answer and in the comments to that: in this case (`write()` to `/dev/poll`), it almost certainly *is* safe. And it better be safe because some of my code implements just what the OP wants, and I delivered that code a decade ago - and it's been working fine ever since. Unfortunately, I don't recall and didn't document why I didn't lock the `write()`. But I plan on going through the Illumos source code later today to be sure. – Andrew Henle Feb 24 '17 at 16:37
  • 1
    Ouch. I'm no longer so certain - a quick look at `man poll.7d` on my Solaris 11 box shows an example that errors out on a partial write to `/dev/poll`. Now I *know* I'm going to have to examine the published source code. – Andrew Henle Feb 24 '17 at 16:46
  • Andrew, I'd appreciate an update with what you find when you look through the source code. For the meantime though, since I cannot assume that my `write()` will be done in a single call, I'm going to stick with locking around the `write()` calls. Thanks. – Wad Feb 24 '17 at 17:38
8

It's not safe in theory, even though write() is completely thread-safe (barring implementation bugs...). Per the POSIX write() standard (emphasis mine): .

The write() function shall attempt to write nbyte bytes from the buffer pointed to by buf to the file associated with the open file descriptor, fildes.

...

RETURN VALUE

Upon successful completion, these functions shall return the number of bytes actually written ...

There is no guarantee that you won't get a partial write(), so even if each individual write() call is atomic, it's not necessarily complete, so you could still get interleaved data because it may take more than one call to write() to completely write all data.

In practice, if you're only doing relatively small write() calls, you will likely never see a partial write(), with "small" and "likely" being indeterminate values dependent on your implementation.

I've routinely delivered code that uses unlocked single write() calls on regular files opened with O_APPEND in order to improve the performance of logging - build a log entry then write() the entire entry with one call. I've never seen a partial or interleaved write() result over almost a couple of decades of doing that on Linux and Solaris systems, even when many processes write to the same log file. But then again, it's a text log file and if a partial or interleaved write() does happen there would be no real damage done or even data lost.

In this case, though, you're "writing" a handful of bytes to a kernel structure. You can dig through the Solaris /dev/poll kernel driver source code at Illumos.org and see how likely a partial write() is. I'd suspect it's practically impossible - because I just went back and looked at the multiplatform poll class that I wrote for my company's software library a decade ago. On Solaris it uses /dev/poll and unlocked write() calls from multiple threads. And it's been working fine for a decade...

Solaris /dev/pool Device Driver Source Code Analysis

The (Open)Solaris source code can be found here: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/devpoll.c#628

The dpwrite() function is the code in the /dev/poll driver that actually performs the "write" operation. I use quotes because it's not really a write operation at all - data isn't transferred as much as the data in the kernel that represents the set of file descriptors being polled is updated.

Data is copied from user space into kernel space - to a memory buffer obtained with kmem_alloc(). I don't see any possible way that can be a partial copy. Either the allocation succeeds or it doesn't. The code can get interrupted before doing anything, as it wait for exclusive write() access to the kernel structures.

After that, the last return call is at the end - and if there's no error, the entire call is marked successful, or the entire call fails on any error:

995     if (error == 0) {
996     /*
997      * The state of uio_resid is updated only after the pollcache
998      * is successfully modified.
999      */
1000        uioskip(uiop, copysize);
1001    }
1002    return (error);
1003}

If you dig through Solaris kernel code, you'll see that uio_resid is what ends up being the value returned by write() after a successful call.

So the call certainly appears to be all-or-nothing. While there appear to be ways for the code to return an error on a file descriptor after successfully processing an earlier descriptor when multiple descriptors are passed in, the code doesn't appear to return any partial success indications.

If you're only processing one file descriptor at a time, I'd say the /dev/poll write() operation is completely thread-safe, and it's almost certainly thread-safe for "writing" updates to multiple file descriptors as there's no apparent way for the driver to return a partial write() result.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • 2
    This does not means that write is not thread safe! write is explicitly specified as thread safe in the standard (*If two threads each call one of these functions, each call shall either see all of the specified effects of the other call, or none of them*, meaning it is thread safe, **no overlapping of their effects**). write may not write all that was requested of course, because there is a lot of conditions that may leads to a partial failure (not enough space, etc). – Jean-Baptiste Yunès Feb 24 '17 at 16:02
  • 1
    @Jean-BaptisteYunès Oh, absolutely. `write()` is certainly thread-safe. The problem is that a partial `write()` could require multiple calls in order to completely write the data, and while that is "thread-safe" it could result in interleaved data. In practice, it depends on how much risk you're willing to take in order to get slightly better performance from simpler and easier-to-maintain code. I'll add some clarification. – Andrew Henle Feb 24 '17 at 16:07
  • 3
    But in that case this is not a single write! What a single write writes is coherent against other "concurrent" writes, thats the point. Ensuring maximum failsafe semantic would necessitate some kind of a costly transactional semantic. – Jean-Baptiste Yunès Feb 24 '17 at 16:15
  • @Jean-Baptiste Yunès The way write is specified means it's impossible to write correct software that allows multiple writes at the same time since you cannot assign semantic meaning to any group of bytes (only single bytes, so yes nitpickers there are theoretical some programs) . So for all practical purposes it's the same as if two writers were forbidden. The guarantee is useful when having one writer and multiple readers though. – Voo Feb 24 '17 at 16:22
  • @Jean-BaptisteYunès Yes, it's because there's no absolute guarantee that it won't take multiple `write()` calls to completely write all data. In this case, though, given that the `write()` is just a handful of bytes to a kernel structure, a partial `write()` is almost certainly **never** going to happen. – Andrew Henle Feb 24 '17 at 16:22
  • 1
    @Jean-BaptisteYunès, when they are directed to anything other than a regular file (or a symlink to the same), POSIX *does not* require concurrent `write()`s to be "coherent", inasmuch as I take you to mean what the standard describes as "atomic". That's not a required aspect of the function's thread-safe character alone. – John Bollinger Feb 24 '17 at 16:29
  • 1
    @JohnBollinger And in this question, the "target" of the `write()` is about as far from a regular file as possible - it's updating a structure inside the kernel. So while it doesn't *have* to be atomic and complete, in this case it's pretty hard to imagine *how* a partial or non-coherent `write()` could even happen. It better be safe in this case - some of my code's been doing it for about a decade now... ;-) (And now, when I get the time later today, I'm going to go through the Illumos source code and see if it really is safe...) – Andrew Henle Feb 24 '17 at 16:33
  • @JohnBollinger You're right I forgot to say it just concerns regular file. – Jean-Baptiste Yunès Feb 24 '17 at 19:11
  • Thanks Andrew. So just to clarify, you are saying that two threads (running on different cores) could call `write()` simultaneously without problem? – Wad Feb 25 '17 at 16:00
  • 1
    @Wad In this case? Writing to `/dev/poll` on Solaris? Yes. Because there doesn't seem to be any way to get a partial result. Especially the way you seem to be doing the `write()` with just one file descriptor at a time. Returning a result indicative of "half a file descriptor" would be literal nonsense. – Andrew Henle Feb 25 '17 at 19:11
  • OK, so what happens if two threads make the write() at precisely the same time: will something internal make sure they are "sequenced" so one fully completes before the other starts? – Wad Feb 25 '17 at 19:40
  • 1
    @Wad There are multiple mutexes in the `/dev/poll` driver. You can trace the usage of the mutexes starting here: http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/devpoll.c#722 The code certainly appears to serialize concurrent `write()` calls. – Andrew Henle Feb 27 '17 at 15:31