6

I was reading the APUE(Advanced Programming in the UNIX Environment), and come across this question when I see $3.11:

if (lseek(fd, 0L, 2) < 0) /* position to EOF */
err_sys("lseek error");
if (write(fd, buf, 100) != 100) /* and write */
err_sys("write error")

APUE says:

This works fine for a single process, but problems arise if multiple processes use this technique to append to the same file. .......The problem here is that our logical operation of ‘‘position to the end of file and write’’ requires two separate function calls (as we’ve shown it). Any operation that requires more than one function call cannot be atomic, as there is always the possibility that the kernel might temporarily suspend the process between the two function calls.

It just says cpu will switch between function calls between lseek and write, I want to know if it will also switch in half write operation? Or rather, is write atomic? If threadA writes "aaaaa", threadB writes "bbbbb", will the result be "aabbbbbaaa"?

What's more,after that APUE says pread and pwrite are all atomic operations, does that mean these functions use mutex or lock internally to be atomic?

choxsword
  • 3,187
  • 18
  • 44
  • **APUE** ? Do you have a reference please? – Mark Setchell May 31 '18 at 12:20
  • @MarkSetchell [**Advanced Programming in the UNIX Environment** by Richard Stevens and Stenphen A.Rago](http://www.apuebook.com/) – choxsword May 31 '18 at 12:22
  • I imagine it isn't simply because the size of the buffer you pass into `write` could easily outside the "MTU" (or equivalent) of the media you're writing to (be it a network socket, local disk, etc) so it may take multiple internal writes, any of which may fail. Additionally IO is inherently asynchronous with potentially very long waits until completion - so no, despite `write`'s _apparent_ simplicity, it provides many opportunities for the OS' scheduler to preempt your thread or process with another that uses the same IO resource (fd, filename, socket, etc) – Dai May 31 '18 at 12:29
  • @Dai I think the core problem is whether IO operation will be interrupted before it's completed,that is, whether the result will be `"aabbbbbaaa"`. – choxsword May 31 '18 at 12:39
  • Writes are atomic. Period. If two processes write (to the same file) both writes are atomic. If they happen to overlap: the second one overwrites part of the first's write. (or vice versa) It is the reponsibility of the *user process(es)* to handle this overlapping case. The kernel only guarantees atomic *single* writes. – joop May 31 '18 at 13:30
  • 2
    I closed this as a dupe because there are excellent answers in the linked question. – rici May 31 '18 at 13:32
  • @joop While `write()` may be atomic, there's no guarantee that it's *complete*. Per [the POSIX `write()` documentation](http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html): "The `write()` function shall attempt to write `nbyte` bytes..." Although a "short" write is admittedly not something I've even seen when writing to a local file. – Andrew Henle May 31 '18 at 13:59
  • 1
    @andrewHenle: a short write is not a failure, and it is required in the case where the write attempts to extend the file beyond the capacity of the filesystem: "If a write() requests that more bytes be written than there is room for (for example, the process' file size limit or the physical end of a medium), only as many bytes as there is room for shall be written." It is a failure only if no bytes can be written. – rici May 31 '18 at 14:09
  • @rici I didn't state that a short write is a failure. I was just pointing out that saying a write operation is atomic isn't sufficient to guarantee all the data requested to be written will actually be written in one operation. The fact that a short write **isn't** considered a failure demonstrates that. – Andrew Henle May 31 '18 at 14:25
  • 1
    @andrew: Sorry, I guess I misunderstood the point of your comment. I wouldn't say that writes are atomic at all (except as provided for small writes to pipes and FIFOs), and the possibility of short writes is one of the reasons. Successful writes are sequenced (in some manner), and since short writes are successful, they are also performed in some kind of sequence. But that's far from a claim about atomicity. – rici May 31 '18 at 14:29
  • @rici But the top voted answer in the quesition you linked implies that `write` is atomic in POSIX but some filesystems does not conform the POSIX, which has a different meaning from what you answered. – choxsword Jun 01 '18 at 11:53

2 Answers2

9

To call the Posix semantics "atomic" is perhaps an oversimplification. Posix requires that reads and writes occur in some order:

Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. (from the Rationale section of the Posix specification for pwrite and write)

The atomicity guarantee mentioned in APUE refers to the use of the O_APPEND flag, which forces writes to be performed at the end of the file:

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

With respect to pread and pwrite, APUE says (correctly, of course) that these interfaces allow the application to seek and perform I/O atomically; in other words, that the I/O operation will occur at the specified file position regardless of what any other process does. (Because the position is specified in the call itself, and does not affect the persistent file position.)

The Posix sequencing guarantee is as follows (from the Description of the write() and pwrite() functions):

After a write() to a regular file has successfully returned:

  • Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

  • Any subsequent successful write() to the same byte position in the file shall overwrite that file data.

As mentioned in the Rationale, this wording does guarantee that two simultaneous write calls (even in different unrelated processes) will not interleave data, because if data were interleaved during a write which will eventually succeed the second guarantee would be impossible to provide. How this is accomplished is up to the implementation.

It must be noted that not all filesystems conform to Posix, and modular OS design, which allows multiple filesystems to coexist in a single installation, make it impossible for the kernel itself to provide guarantees about write which apply to all available filesystems. Network filesystems are particularly prone to data races (and local mutexes won't help much either), as is mentioned as well by Posix (at the end of the paragraph quoted from the Rationale):

This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.

The first guarantee (about subsequent reads) requires some bookkeeping in the filesystem, because data which has been successfully "written" to a kernel buffer but not yet synched to disk must be made transparently available to processes reading from that file. This also requires some internal locking of kernel metadata.

Since writing to regular files is typically accomplished via kernel buffers and actually synching the data to the physical storage device is definitely not atomic, the locks necessary to provide these guarantee don't have to be very long-lasting. But they must be done inside the filesystem because nothing in the Posix wording limits the guarantees to simultaneous writes within a single threaded process.

Within a multithreaded process, Posix does require read(), write(), pread() and pwrite() to be atomic when they operate on regular files (or symbolic links). See Thread Interactions with Regular File Operations for a complete list of interfaces which must obey this requirement.

Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
  • _"If two threads each call [the write() function], each call shall either see all of the specified effects of the other call, or none of them."_ ---------quoted from the duplicate question. – choxsword Jun 01 '18 at 11:55
  • @bigxiao: in a single process, `write` is atomic. I link to that section of Posix in the last paragraph of my answer. But IMHO that doesn't really make the call atomic because `write` acts on a global object visible beyond the context of a single process. Between processes, the guarantee is weaker, although there is still some kind of guarantee. – rici Jun 01 '18 at 12:08
  • @bigxiao: I accept that this is semantics. I didn't say in my first para "`write` is not atomic". What I said was that calling it "atomic" is an oversimplification. There are a lot of little corner cases. – rici Jun 01 '18 at 12:12
  • So in multithread of single process it's guaranteed to be atomic, but not in multiprocess? – choxsword Jun 01 '18 at 12:15
  • @bigxiao: between processes, what posix says is that if you can prove a read happens after a write, the read will see the results of that write. That's an important guarantee, but I think it is not as strong as the word "atomic" might make you believe. (Also, it's not true, really. The filesystem/host might crash before the update is committed to physical storage; after reboot, the `write()` might no longer be visible. I'm not saying there's anything wrong with that; just that it might not fit into everyone's idea of what "atomic" means.) – rici Jun 01 '18 at 12:48
-3

In Linux there are blocking and non-blocking system calls. The write is an example of blocking system call, which means the execution thread will be blocked until the write completes. So once the user process called write, it can not execute anything else until the system call is complete. So from user thread perspective it will behave like atomic [although at kernel level lot many things can happen and kernel execution of system call can be interrupted many times].

Ketan Mukadam
  • 789
  • 3
  • 7
  • What if cpu has multi cores and different threads runs on different cores(hardware concurrency)? Are all the cores blocked on user level? – choxsword May 31 '18 at 12:36
  • A thread can only execute on one core at any instance of time. Same system call can be executed from many core by different threads and each of those threads are blocked. Kernel has locks to ensure multiple instances of same system call do not corrupt common resources – Ketan Mukadam May 31 '18 at 12:38
  • So the result won't be `"aabbbbbaaa"` in any circumstance? – choxsword May 31 '18 at 12:40
  • cores are not blocked but threads are blocked, which means scheduler will decide to run something else on the cores. – Ketan Mukadam May 31 '18 at 12:40
  • The write corruption can still occur if multiple threads are accessing same file, since kernel can not guarantee sequential access to file across threads. As I understood your question was geared towards user process doing something else when `write` is called. – Ketan Mukadam May 31 '18 at 12:42
  • So before one thread's IO write is completed , another thread's IO write won't execute? – choxsword May 31 '18 at 12:44
  • My question is assuming multiple threads are acceesing the same file. And that's what the APUE is talking about. – choxsword May 31 '18 at 12:47
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/172164/discussion-between-bigxiao-and-ketan-mukadam). – choxsword May 31 '18 at 12:47