Will multi-thread do write() interleaved

Question

If I have two threads, thread0 and thread1.
thread0 does:

const char *msg = "thread0 => 0000000000\n";
write(fd, msg, strlen(msg));

thread1 does:

const char *msg = "thread1 => 111111111\n";
write(fd, msg, strlen(msg));

Will the output interleave? E.g.

thread0 => 000000111
thread1 => 111111000

What language are you using? Add a language tag to help experts find your question. — miltonb, Apr 27 '17 at 04:31
It can if the threads start writing to the same file at the same time. — ForceBru, Apr 27 '17 at 07:11
Assuming write() maps directly to a system call, (ie. no intermediate buffering in crt), It's not going to happen with any I/O architecture that I know of. Then again... just don't do it. Queue the data to one write thread, (P-C queue). That's safe, and easier to debug. When designing, esp. multithreaded, 'easier to debug' always wins for me. — ThingyWotsit, Apr 27 '17 at 07:40
Then there's the other advantages of one write thread and queueing - any temp. write delays/latency because of directory updates, or comms delays on networked disks, are not inflicted on all the threads queueing the data. — ThingyWotsit, Apr 27 '17 at 07:44
@ThingyWotsit Thanks, the thought of P-C queue is very wise. — Charles, Apr 27 '17 at 09:23

score 3 · Accepted Answer · edited May 23 '17 at 12:17

First, note that your question is "Will data be interleaved?", not "Are write() calls [required to be] atomic?" Those are different questions...

"TL;DR" summary:

write() to a pipe or FIFO less than or equal to PIPE_BUF bytes won't be interleaved
write() calls to anything else will be somewhere in the range between "probably won't be interleaved" to "won't ever be interleaved" with the majority of implementations in the "almost certainly won't be interleaved" to "won't ever be interleaved" range.

Full Answer

If you're writing to a pipe or FIFO, your data will not be interleaved at all for write() calls for PIPE_BUF or less bytes.

Per the POSIX standard for write() (note the bolded part):

RATIONALE

...

An attempt to write to a pipe or FIFO has several major characteristics:

Atomic/non-atomic: A write is atomic if the whole amount written in one operation is not interleaved with data from any other process. This is useful when there are multiple writers sending data to a single reader. Applications need to know how large a write request can be expected to be performed atomically. This maximum is called {PIPE_BUF}. This volume of POSIX.1-2008 does not say whether write requests for more than {PIPE_BUF} bytes are atomic, but requires that writes of {PIPE_BUF} or fewer bytes shall be atomic.

...

Applicability of POSIX standards to Windows systems, however, is debatable at best.

So, for pipes or FIFOs, data won't be interleaved up to PIPE_BUF bytes.

How does that apply to files?

First, file append operations have to be atomic. Per that same POSIX standard (again, note the bolded part):

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

Also see Is file append atomic in UNIX?

So how does that apply to non-append write() calls?

Commonality of implementation. See the Linux read/write syscall implementations for an example. (Note that the "problem" is handed directly to the VFS implementation, though, so the answer might also be "It might very well depend on your file system...")

Most implementations of the write() system call inside the kernel are going to use the same code to do the actual data write for both append mode and "normal" write() calls - and for pwrite() calls, too. The only difference will be the source of the offset used - for "normal" write() calls the offset used will be the current file offset. For append write() calls the offset used will be the current end of the file. For pwrite() calls the offset used will be supplied by the caller (except that Linux is broken - it uses the current file size instead of the supplied offset parameter as the target offset for pwrite() calls on files opened in append mode. See the "BUGS" section of the Linux pwrite() man page.)

So appending data has to be atomic, and that same code will almost certainly be used for non-append write() calls in all implementations.

But the "write operation" in the append-must-be-atomic requirement is allowed to return less than the total number of bytes requested:

The write() function shall attempt to write nbyte bytes ...

Partial write() results are allowed even in append operations. But even then, the data that does get written must be written atomically.

What are the odds of a partial write()? That depends on what you're writing to. I've never seen a partial write() result to a file outside of the disk filling up or an actual hardware failure. Or even a partial read() result. I can't see any way for a write() operation that has all its data on a single page in kernel memory resulting in a partial write() in anything other than a disk full or hardware failure situation.

If you look at Is file append atomic in UNIX? again, you'll see that actual testing shows that append write() operations are in fact atomic.

So the answer to "Will multi-thread do write() interleaved?" is, "No, the data will almost certainly not be interleaved for writes that are at or under 4KB (page size) as long as the data does not cross a page boundary in kernel space." And even crossing a page boundary probably doesn't change the odds all that much.

If you're writing small chunks of data, it depends on your willingness to deal with the almost-certain-to-never-happen-but-it-might-anyway result of interleaved data. If it's a text log file, I'd opine that it won't matter anyway.

And note that it's not likely to be any faster to use multiple threads to write to the same file - the kernel is likely going to lock things and effectively single-thread the actual write() calls anyway to ensure it can meet the atomicity requirements of writing to a pipe and appending to a file.

Regarding pipes and FIFOs, "A write is atomic if [...] not interleaved with data from any other **process**", but the question actually asks about different threads in the *same* process. Does POSIX, in your opinion, guarantee atomicity of writes from the same process? — trent, Aug 25 '20 at 17:45
@trent - Did you find an answer to this? I'd like to know as well. — user2205930, Aug 19 '22 at 15:13

Will multi-thread do write() interleaved

1 Answers1