Redirected output from a subprocess call getting lost?

Question

I have some Python code that goes roughly like this, using some libraries that you may or may not have:

# Open it for writing
vcf_file = open(local_filename, "w")

# Download the region to the file.
subprocess.check_call(["bcftools", "view",
    options.truth_url.format(sample_name), "-r",
    "{}:{}-{}".format(ref_name, ref_start, ref_end)], stdout=vcf_file)

# Close parent process's copy of the file object
vcf_file.close()

# Upload it
file_id = job.fileStore.writeGlobalFile(local_filename)

Basically, I'm starting a subprocess that's supposed to go download some data for me and print it to standard out. I'm redirecting that data to a file, and then, as soon as the subprocess call returns, I'm closing my handle to the file and then copying the file elsewhere.

I'm observing that, sometimes, the tail end of the data I'm expecting isn't making it into the copy. Now, it's possible that bcftools is just occasionally not writing that data, but I'm worried that I might be doing something unsafe and somehow getting access to the file after subprocess.check_call() has returned, but before the data that the child process writes to standard output makes it onto the disk where I can see it.

Looking at the C standard (since bcftools is implemented in C/C++), it looks like when a program exits normally, all open streams (including standard output) are flushed and closed. See the [lib.support.start.term] section here, describing the behavior of exit(), which is called implicitly when main() returns:

--Next, all open C streams (as mediated by the function signatures declared in ) with unwritten buffered data are flushed, all open C streams are closed, and all files created by calling tmp- file() are removed.30)

--Finally, control is returned to the host environment. If status is zero or EXIT_SUCCESS, an implementation-defined form of the status successful termination is returned. If status is EXIT_FAILURE, an implementation-defined form of the status unsuccessful termination is returned. Otherwise the status returned is implementation-defined.31)

So before the child process exits, it closes (and thus flushes) standard output.

However, the manual page for Linux close(2) notes that closing a file descriptor does not necessarily guarantee that any data written to it has actually made it to disk:

A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored, use fsync(2). (It will depend on the disk hardware at this point.)

Thus, it would appear that, when a process exits, its standard output stream is flushed, but if that stream is actually backed by a file descriptor pointing to a file on disk, the write to disk is not guaranteed to have completed. I suspect that that may be what is going on here.

So, my actual questions:

Is my reading of the specs correct? Can a child process appear to its parent to have terminated before its redirected standard output is available on disk?
Is it possible to somehow wait until all data written by the child process to files has actually been synced to disk by the OS?
Should I be calling flush() or some Python version of fsync() on the parent process's copy of the file object? Can that force writes to the same file descriptor by child processes to be committed to disk?

"all open C streams with unwritten buffered data are flushed, all open C streams are closed". Aren't there two statements there saying that the streams are both flushed and closed? The [Linux exit man page](http://man7.org/linux/man-pages/man3/exit.3.html) has similar wording: "All open stdio(3) streams are flushed and closed." — kaylum, Jan 06 '16 at 02:50
Neither a flush nor a close is the same as a call to fsync(), so it's possible that after the process exits, the OS will keep the data in its own buffers for some amount of time, and not write it onto the physical disk until later. While the data is in an OS buffer and not yet physically on the disk, is it visible to other processes? — interfect, Jan 08 '16 at 19:01

score 1 · Accepted Answer · edited May 23 '17 at 11:50

1

Yes, there could be minutes before the data is written to the disk (physically). But you can read it long before that.

Unless you are worrying about a power failure or a kernel panic; it doesn't matter whether the data is on disk. The important part whether the kernel thinks that the data is written.

It is safe to read from the file as soon as check_call() returns. If you don't see all the data; it may indicate a bug in bcftools or that writeGlobalFile() doesn't upload all the data from the file. You could try to workaround the former by disabling the block-buffering mode for bsftools' stdout (provide a pseudo-tty, use unbuffer command-line utility, etc).

Q: Is my reading of the specs correct? Can a child process appear to its parent to have terminated before its redirected standard output is available on disk?

yes. yes.

Q: Is it possible to somehow wait until all data written by the child process to files has actually been synced to disk by the OS?

no. fsync() is not enough in the general case. Likely, you don't need it anyway (reading data back is a different issue, from making sure that it is written to disk).

Q: Should I be calling flush() or some Python version of fsync() on the parent process's copy of the file object? Can that force writes to the same file descriptor by child processes to be committed to disk?

It would be pointless. .flush() flushes buffers that are internal to the parent process (you can use open(filename, 'wb', 0) to avoid creating unnecessary buffers in the parent).

fsync() works on a file descriptor (the child has its own file descriptor). I don't know whether the kernel uses different buffers for different file descriptors referring to the same disk file. Again, it doesn't matter -- if you observe data missing (no-crashes); fsync() won't help here.

Q: Just to be clear, I see that you're asserting that the data should indeed be readable by other processes, because the relevant OS buffers are shared between processes. But what's your source for that assertion? Is there a place in a spec or the Linux documentation you can point to that guarantees that those buffers are shared?

Look for "After a write() to a regular file has successfully returned":

Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

edited May 23 '17 at 11:50

Community

1
1

answered Jan 06 '16 at 05:35

jfs

399,953
195
994
1,670

So the OS doesn't guarantee that data gets written to disk when you close a file descriptor, but it does guarantee that all other readers on the filesystem will see your writes? – interfect Jan 06 '16 at 19:51
@interfect: `close(fd)` is unrelated. Yes, after successful `write(fd, data)` there is no guarantee that the data is written to disk physically but it doesn't prevent you from reading it back. – jfs Jan 07 '16 at 02:57
What about other processes reading it back? Is the buffer holding the data not yet committed to disk global, or per-process? What I'm worried is that the filesystem may provide only read-your-own-writes consistency, and may re-order writes and reads by different processes arbitrarily. See for eample the question and answer here http://stackoverflow.com/a/19406462/402891 about data not being readable by another process when it receives an inotify message that a file has been closed. – interfect Jan 07 '16 at 20:52
@interfect: again, the answer you've linked is completely unrelated to your case (it discusses when the data is written *physically* to disk). Yes, [if you write to the same file from different processes without synchronization then there is no definite order](http://goo.gl/Pavl03). I don't see how it is relevant to your case: only the child writes to the file and only the parent reads from the file after the child is already dead in your case -- the order is fixed -- it is the order used by the child. – jfs Jan 08 '16 at 05:20
The discussion there is using "on disk" as a synonym for "can be read by another process"; in that case, a process is getting a notification that another process has closed a file, but cannot see the data that the other process has written. Whether the data is physically on disk or in a shared cache is not really what they're getting at there. That answer provides some evidence that a write and a close from one process that happen before a read from another can be placed after it by the OS; I want to know if the death of one process in the intervening time adds additional ordering guarantees. – interfect Jan 08 '16 at 18:55
Just to be clear, I see that you're asserting that the data should indeed be readable by other processes, because the relevant OS buffers are shared between processes. But what's your source for that assertion? Is there a place in a spec or the Linux documentation you can point to that guarantees that those buffers are shared? – interfect Jan 08 '16 at 19:05
1

@interfect: look for [*"After a write() to a regular file has successfully returned"*](http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html) – jfs Jan 08 '16 at 19:19
"Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified." That sure sounds like other processes should also be able to see the data with a read. – interfect Jan 11 '16 at 18:50

Redirected output from a subprocess call getting lost?

1 Answers1