18

Note: Please read to the end before marking this as duplicate. While it's similar, the scope of what I'm looking for in an answer extends beyond what the previous question was asking for.

Widespread practice, which I tend to agree with, tends to be treating close purely as a resource-deallocation function for file descriptors rather than a potential IO operation with meaningful failure cases. And indeed, prior to the resolution of issue 529, POSIX left the state of the file descriptor (i.e. whether it was still allocated or not) unspecified after errors, making it impossible to respond portably to errors in any meaningful way.

However, a lot of GNU software goes to great lengths to check for errors from close, and the Linux man page for close calls failure to do so "a common but nevertheless serious programming error". NFS and quotas are cited as circumstances under which close might produce an error but does not give details.

What are the situations under which close might fail, on real-world systems, and are they relevant today? I'm particularly interested in knowing whether there are any modern systems where close fails for any non-NFS, non-device-node-specific reasons, and as for NFS or device-related failures, under what conditions (e.g. configurations) they might be seen.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    Perhaps this question is similar to the one it's marked as a duplicate of, but the other question does not have sufficient answers that address the specifics I asked for at the end of this one, and does not seem to be asking for such specifics. So I think there's at least some difference. If this question is to remain closed, how should I go about getting more-acceptable answers to the question it's a "duplicate" of? – R.. GitHub STOP HELPING ICE Jun 29 '14 at 16:41
  • In Linux, `close()` can only fail for a normal file, if the underlying filesystem has a `flush()` file_operations handler. A quick grep says only `cifs`/`smbfs` (Windows file sharing), `exofs`, `nfs` (v3 and v4), and `fuse` do. (`ecryptfs` also does, but it just calls the underlying filesystem `flush()` if it has one). `close()` time errors are rare, but they do occur. For NFS, you need a glitchy connection. For fuse, I think filesystem corruption is the most likely one. (continued) – Nominal Animal Jun 29 '14 at 17:14
  • @NominalAnimal: I would think you would need both a glitchy connection **and** non-robust configuration. In order for the ordering properties of writes to be guaranteed, every write has to involve a round trip to the server. So my understanding would be that this issue only arises on misconfigured NFS setups doing some sort of caching that already breaks write semantics in other ways... – R.. GitHub STOP HELPING ICE Jun 29 '14 at 17:18
  • NFSv3 only guarantees write ordering if the `sync` mount option is used on all clients accessing the file. Typical installations only provide close-to-open cache consistency, as `sync` slows down file access too much. File-wide `fcntl()` locks must be used to manage concurrent writes. On NFSv4, file delegation is used. I *think* (not sure!) that on NFSv4 the `close()` error path would involve the closer missing a re-delegation prior to the `close()`, the NFSv4 server going on with the delegation after some timeout, ending up rejecting the `close()`-time `->flush()`. – Nominal Animal Jun 29 '14 at 17:33
  • (continued) Having used NFSv4 without seeing a `close()` error, I think `fuse` (and `ecryptfs` on top of `fuse`) is the likeliest one, especially with a corrupted filesystem (bad media, nasty crash, etc.). I agree that most users will never see a `close()` error. However, just because it is *rare* does not mean you should ignore it. *"It's too rare to bother with"* is a pretty lousy reason for an application to choose not to tell the user if their data was corrupted. I work with very time-consuming data, though. If your app is just a fun widget, I might not bother either. – Nominal Animal Jun 29 '14 at 17:40
  • @NominalAnimal: Except that `close` does not necessarily report such errors; the write to physical media which has failed may not happen until long after your program exits. For programs which need to know that the data was synced to disk, they need to use `fsync` anyway. If you've already called `fsync` successfully, I don't see any way `close` could fail. And if you haven't called `fsync`, checking for failure from `close` seems insufficient. – R.. GitHub STOP HELPING ICE Jun 29 '14 at 17:49
  • 1
    Physical media errors cannot usually be detected at write time anyway. I meant *corrupted* filesystems, i.e. where the underlying metadata was corrupted. Say, incorrect fragment/block index, caught only when trying to write to it, flushing final cached data at `close()` time. If you have no `write()` errors and no error at `close()`, you know it was written correctly, although it may not be in long-term storage yet. `fsync()` waits until data hits media, a much stronger requirement -- and it might be very slow, especially with fuse: consider sshfs etc.. `close()` checking costs almost nothing. – Nominal Animal Jun 29 '14 at 18:34
  • 2
    @NominalAnimal: What does "written correctly, but not in long-term storage yet" mean, though? I can't pull out the nerdstick/shut off the NFS server as soon as `close` returns successfully---I have to wait for a sync---so what does `close`-checking add, exactly, in terms of guarantees? Is it just a canary that makes some noise if you've got a kernel bug?) – tmyklebu Jun 29 '14 at 19:11
  • 1
    @NominalAnimal: My impression -- and this may be wrong, that's what I'm trying to ascertain -- is that checking `close` provides absolutely no information about failure to commit the file to storage, either due to physical device failures or logical failures (corrupted filesystem). So in a sense, the only usefulness of checking `close` seems to be attempting to provide a *stronger* data-consistency guarantee for NFS-with-caching than what you provide for local files. And that strikes me as dubious. If you care about consistency you should care about both, and use `fsync`. – R.. GitHub STOP HELPING ICE Jun 29 '14 at 19:39
  • 1
    @tmyklebu, R..: *Canary* is probably the best term to describe how I think of it. No "stronger" guarantees, I just don't want to miss a problem detected by the kernel/nfsd/userspace components of fuse. Bugs in corrupted fs handling, delegation issues with NFSv4 on spotty connections, bad error handling in fuse filesystems, are what I am thinking about -- human errors. You seem to assume/assert `close()` will never fail in any meaningful way. Based on what? Trust? Hope? *Standards*? I treat `close()` just like I would a `read()` or `write()`. I may be wrong, but I want to err on the safe side. – Nominal Animal Jun 29 '14 at 20:11
  • 1
    @NominalAnimal: It's not that I assume `close` will never fail in a meaningful way. It's that, if it does, I have no idea how to recover since I have no clear mental model of why it would fail. So I print a message to stderr and bomb out instead of trying to handle it. Error paths that are only triggered under conditions I don't understand give me the willies. (If it really is just a canary for buggy fs implementations, this is all fine.) – tmyklebu Jun 29 '14 at 20:17
  • @NominalAnimal: I'm not asserting that it won't fail in a meaningful way but questioning whether it does. I agree totally that you should check `write` for errors, but so far, all instances where `close` is returning an error seem to be bugs in `write` that happened under historical (and probably no-longer-relevant, e.g. the quota issue) or poorly-configured (NFS violating POSIX semantics for regular files via caching) setups: failure to report an error that `write` is required to report. – R.. GitHub STOP HELPING ICE Jun 29 '14 at 20:17
  • 1
    @R..: Crap fuse file systems aren't really historical. – tmyklebu Jun 29 '14 at 20:24
  • @tmyklebu: Indeed. But I'm not really inclined to cover up their bugs for them... – R.. GitHub STOP HELPING ICE Jun 29 '14 at 20:29
  • @tmyklebu: Like I said: I treat `close` errors as read/write errors. I do the same as if a read or write error had occurred, usually bomb out. To me, it is a simple, working mental model; if you see a problem there, I'd really appreciate the details. I do want to be careful, even paranoid, wrt. error handling, as long as I don't produce false positives. If you detect a problem, the user might be able to work around it; my responsibility ends at letting the user know. I don't care *whose* bug or brainfart caused the problem, just knowing it did suffices for me. – Nominal Animal Jun 29 '14 at 20:30
  • BTW I'm pretty sure there is no plausible argument that checking `close` is useful for files used only for reading... – R.. GitHub STOP HELPING ICE Jun 29 '14 at 20:32
  • 1
    @NominalAnimal: Part of the interest of this question is that, in cases where you don't intend to call `close` but want to use `dup2` or `dup3` to replace file descriptors atomically, you have to jump through some serious hoops to get the error result of `close`: First `dup` the fd (which could fail for `EMFILE`, in which case you're stuck), then do the atomic replacement, then `close` the duplicate to get the error. This seems like a lot of mess to go through if the error status is not really meaningful. – R.. GitHub STOP HELPING ICE Jun 29 '14 at 20:35
  • @R..: They aren't necessarily bugs. It's often useful to expose something as a file system even when it might be harmful, stupendously inconvenient or impossible to make it satisfy the guarantees various fs operations have, yet the utility of being able to mount the thing and work with it as if it were a file system outweighs the harm from broken promises. Floppies mounted async, nerdsticks, NFS, and anything with kerberos in the middle I think runs the gamut of extant not-quite-file systems that are nonetheless more useful than what we'd get if we insisted on POSIX-compliance. – tmyklebu Jun 29 '14 at 20:45
  • @R..: To be honest, I only check `close()` for errors when I call it explicitly. If it is implicitly closed, I trust the syscall (`dup2()` or `dup3()` in your example) to tell me if there was an error. (Oh hey, in current Linux kernels they do not: `fs/file.c:do_dup2()` ignores the `filp_close()` return value; the one that could tell us if a close-time filesystem `->flush()` error occurred. I don't mind, though; I'm not trying to give *better* guarantees of anything, just work with what I have.) – Nominal Animal Jun 29 '14 at 20:48
  • 1
    @NominalAnimal: `dup2` **fundamentally** has to hide the error from closing the file being replaced; this is not a bug. This is because the fd has already been relinquished and needs to be replaced atomically. If it returned with the error status, the destination fd would be freed and there would be a race condition where another file could be assigned that value. – R.. GitHub STOP HELPING ICE Jun 29 '14 at 21:13
  • @R..: No, I didn't think or intend to imply it was a bug. It is clear one would need new syscalls, say `dup2check(int oldfd, int newfd, int *newfd_close_errno)` and `dup3check(int oldfd, int newfd, int flags, int *newfd_close_errno)`. Although they'd be very simple to implement in Linux (even I could do that patch!), I don't have a single use case I can think of where I'd care about newfd enough, and in any case I'm not sure `close` checking is **that** important. (Pushing even a simple patch through both the kernel and glibc developers is quite a bit of work.) – Nominal Animal Jun 29 '14 at 23:23
  • @NominalAnimal: Indeed, if you can give up `close` checking when using `dup2`/`dup3`, this suggests to me that `close` checking is being done less *because it's important* and more like *because you can* or *because it fits a pattern you've made into a habit*. I don't think this is a bad thing, but it's a distinction worth making IMO. – R.. GitHub STOP HELPING ICE Jul 01 '14 at 19:51
  • @R..: No. If your logic was true, you look both ways before crossing a road "less *because it's important* and more like *because you can* or *because it fits a pattern you've made into a habit*". It is important for safety/knowing if a problem was detected. I'm just willing to take a risk, if explicitly checking `close` results in code that is unlikely to be and remain bug-free itself; false positives would be a serious problem for the user. My goal is reliable operation *as a whole*. I cannot write or maintain perfect code, so certain tradeoffs have to be made. – Nominal Animal Jul 01 '14 at 21:35

2 Answers2

12

Once upon a time (24 march, 2007), Eric Sosman had the following tale to share in the comp.lang.c newsgroup:

(Let me begin by confessing to a little white lie: It wasn't fclose() whose failure went undetected, but the POSIX close() function; this part of the application used POSIX I/O. The lie is harmless, though, because the C I/O facilities would have failed in exactly the same way, and an undetected failure would have had the same consequences. I'll describe what happened in terms of C's I/O to avoid dwelling on POSIX too much.)

The situation was very much as Richard Tobin described. The application was a document management system that loaded a document file into memory, applied the user's edits to the in- memory copy, and then wrote everything to a new file when told to save the edits. It also maintained a one-level "old version" backup for safety's sake: the Save operation wrote to a temp file, and then if that was successful it deleted the old backup, renamed the old document file to the backup name, and renamed the temp file to the document. bak -> trash, doc -> bak, tmp -> doc.

The write-to-temp-file step checked almost everything. The fopen(), obviously, but also all the fwrite()s and even a final fflush() were checked for error indications -- but the fclose() was not. And on one system it happened that the last few disk blocks weren't actually allocated until fclose() -- the I/O system sat atop VMS' lower-level file access machinery, and a little bit of asynchrony was inherent in the arrangement.

The customer's system had disk quotas enabled, and the victim was right up close to his limit. He opened a document, edited for a while, saved his work thus far, and exceeded his quota -- which went undetected because the error didn't appear until the unchecked fclose(). Thinking that the save succeeded, the application discarded the old backup, renamed the original document to become the backup, and renamed the truncated temp file to be the new document. The user worked a little longer and saved again -- same thing, except you'll note that this time the only surviving complete file got deleted, and both the backup and the master document file are truncated. Result: the whole document file became trash, not just the latest session of work but everything that had gone before.

As Murphy would have it, the victim was the boss of the department that had purchased several hundred licenses for our software, and I got the privilege of flying to St. Louis to be thrown to the lions.

[...]

In this case, the failure of fclose() would (if detected) have stopped the delete-and-rename sequence. The user would have been told "Hey, there was a problem saving the document; do something about it and try again. Meanwhile, nothing has changed on disk." Even if he'd been unable to save his latest batch of work, he would at least not have lost everything that went before.

Nisse Engström
  • 4,738
  • 23
  • 27
  • 42
  • My question contained the phrase "on **real-world systems**, and are they **relevant today**" and was tagged `posix` and `linux`. So an anecdote related to VMS does not really apply. Are quotas on any modern systems implemented that nonsensical way (checking on flush-to-physical-media rather than on logical write)? – R.. GitHub STOP HELPING ICE Jun 29 '14 at 20:07
  • Part of the reason the distinction is relevant is that POSIX places certain fairly strong requirements on file operations on regular files: they're atomic with respect to each other, and immediately visible to other processes. So if `write` succeeds, the data actually has to be committed to the *logical* file (so other processes accessing the file can see it); the only question is whether it's on the physical media, which only matters in case power fails. And for the latter, only `fsync`, not `close` checking, will tell you the answer. – R.. GitHub STOP HELPING ICE Jun 29 '14 at 20:09
  • 1
    @R..: I think it's still an interesting tale to tell to those who wonder why you should check the return value of (f)close(), but I'll be happy to delete it if it is irrelevant. – Nisse Engström Jun 29 '14 at 20:14
  • 2
    @R..: A fuse filesystem is allowed to cache the *logical* file, since all processes accessing it will see the same cached state. It is not required to flush the cache to say an underlining file system at any point; if it does a final flush at close time, that is a perfectly valid point of reporting a write error (originating from the underlying storage filesystem, not at the logical level). Nisse Engström: fuse filesystems are not that different than that VMS filesystem, so in my humble opinion, the anecdote is valid and useful. – Nominal Animal Jun 29 '14 at 20:18
  • The bit about `fflush` appears to be a red herring here. `fflush` does not, to my knowledge, cause a `fsync`. All the anecdote seems to be getting at is that `write` calls can succeed when a subsequent `close` would fail. – tmyklebu Jun 29 '14 at 20:23
  • @tmyklebu: `fflush` has not been mentioned. The references to "flush" are a kernel and/or fuse implementation detail. – R.. GitHub STOP HELPING ICE Jun 29 '14 at 20:24
  • @R..: `fflush` is mentioned in the anecdote. No `fsync` or similar is mentioned, however. – tmyklebu Jun 29 '14 at 20:25
  • @NominalAnimal: Using it as an analogy for fuse is definitely more interesting and, I think, makes it relevant to the question. On the other hand, I think conceptually this is a failure of the "physical storage layer" that the fuse fs resides on. If fuse is simply using an underlying file/device locally and already reserved space for storage at the time of `write` (or earlier), then such errors are just hardware errors that you wouldn't expect to see without `fsync`.... – R.. GitHub STOP HELPING ICE Jun 29 '14 at 20:27
  • ... If on the other hand the fuse fs is actually doing the allocation/write-to-remote-storage at close time, this is a bug in the implementation: it's failing to report errors during `write` that need to be reported at `write` time, i.e. the same bug that exists in caching NFS clients. – R.. GitHub STOP HELPING ICE Jun 29 '14 at 20:27
  • @tmyklebu: Read the parenthetical confession at the top of the anecdote. – R.. GitHub STOP HELPING ICE Jun 29 '14 at 20:29
  • @R..: Yeah, but `fflush` is "flush the userspace buffer in the `FILE *`" rathern than "make sure this has all hit disk." – tmyklebu Jun 29 '14 at 20:30
  • @R:..: Agreed, except that fuse filesystems have the capability of reporting the error at `close` time. It could be, for example, a lost connection with an sftpfs or similar. As a user, I prefer they did report it at close time, since I can then decide what to do about it. Even if it is technically a "hardware/underlying storage-level error", why *not* report it as soon as it is detected? And why require ordered writes, when a fs can report the error at close time? It could fit the users needs better! – Nominal Animal Jun 29 '14 at 20:56
1

Consider the inverse of your question: "Under what situations can we guarantee that close will succeed?" The answer is:

  • when you call it correctly, and
  • when you know that the file system the file is on does not return errors from close in this OS and Kernel version

If you are convinced that you program doesn't have any logic errors and you have complete control over the Kernel and file system, then you don't need to check the return value of close.

Otherwise, you have to ask yourself how much you care about diagnosing problems with close. I think there is value in checking and logging the error for diagnostic purposes:

  • If a coder makes a logic error and passes an invalid fd to close, then you'll be able to quickly track it down. This may help to catch a bug early before it causes problems.
  • If a user runs the program in an environment where close does return an error when (for example) data was not flushed, then you'll be able to quickly diagnose why the data got corrupted. It's an easy red flag because you know the error should not occur.
Anton
  • 3,170
  • 20
  • 20
  • Passing an invalid fd to `close` is not catchable; more-likely, the fd is valid but belongs to another part of the code, and by closing it, you trigger catastrophic file corruption or information leak. So I don't think there's any value in trying to catch `EBADF`. You just need to ensure that it can't happen (and possibly use `assert` at debug time if you're having trouble doing that). In any case, `EBADF` is outside the scope of what I intended to ask about. – R.. GitHub STOP HELPING ICE Jun 29 '14 at 19:34
  • As for the second case, `close` **does not** flush data for ordinary local files. Aside from deallocating the fd and resources associated with the open file description (if this was the last fd referring to this open file description) it's essentially a no-op. So I can't see how checking for errors from `close` does anything but give you some (still weak) consistency guarantees for unusual (NFS-with-caching, strange devices, fuse, ...?) files that you would not have for ordinary local files. It seems like if you really need consistency you have to use `fsync`. And if not, why bother? – R.. GitHub STOP HELPING ICE Jun 29 '14 at 19:42
  • 1
    @R..: fuse filesystems do have a filesystem-specific `file_operations->flush` handler, called at `close` time, and this is the only point where an error can occur. So, your assertion that *"`close` **does not** flush data for ordinary local files"* is incorrect for fuse filesystems. Exactly what that `->flush` entails, is completely up to the filesystem -- I have no idea -- but I for one do not wish to assume it cannot fail, and I have yet to see any reason why it should never fail. (Other than lots of applications not checking for `close()` errors.) – Nominal Animal Jun 29 '14 at 21:01
  • @NominalAnimal: Are you saying that fuse doesn't provide any way to report errors at `write` time? That perhaps makes the issue more interesting. – R.. GitHub STOP HELPING ICE Jun 30 '14 at 06:35
  • @R..: No, I mean there may be fuse file systems that can detect an error at `close` time they could not detect during the last `write`, even if they are sane and useful otherwise. For example, consider a fuse filesystem that commits the file to revision control at close time, but with current user revoked access between the last `write` and `close`. – Nominal Animal Jun 30 '14 at 09:16
  • @NominalAnimal: I see. BTW this type of issue makes fd leaks to child processes a lot more problematic too, if you're trying to handle it: `close` will always succeed as long as there's any other fd lying around (in any process) referring to the same open file description. – R.. GitHub STOP HELPING ICE Jul 01 '14 at 19:38
  • @R..: True. However, fd leak is a bug itself, is it not? My own `fork`+`exec` routines explicitly close all file descriptors (up to max(`getrlimit(RLIMIT_NOFILE),sysconf(OPEN_MAX)`), since the cost is just a couple of thousand syscalls). Using the recent linux-specific [open file description locks](http://man7.org/linux/man-pages/man2/fcntl.2.html) even avoid the locking pitfalls (as closing *any* of the file descriptors unlocks POSIX record locks). – Nominal Animal Jul 01 '14 at 21:28
  • @NominalAnimal: Your "close all" loop invokes undefined behavior, per [Austin Group interpretation for issue 149](http://austingroupbugs.net/view.php?id=149). `open`+`O_CLOEXEC` and other interfaces which atomically set the `FD_CLOEXEC` flag along with allocating the fd are the conforming way to avoid fd leaks. As for fd leaks being a bug, yes, but aside from the `close` error reporting issue, fd leaks are largely irrelevant except when they accumulate and lead to fd exhaustion or when the fd leaks into a different privilege domain (e.g. from webserver to cgi). – R.. GitHub STOP HELPING ICE Jul 02 '14 at 01:55
  • BTW if you insist on doing the close-all, you can make it about 1000x faster in the common case by using `poll` with a zero timeout to check for `POLLNVAL` on all the file descriptors with a single syscall, and only closing the ones that are actually open. – R.. GitHub STOP HELPING ICE Jul 02 '14 at 01:57
  • @R..: You're absolutely right! `poll()` with zero timeout and `POLLNVAL` for all those descriptors to find which ones are already closed, then `fcntl(F_SETFD, FD_CLOEXEC)` on the others (not intended to be kept), is what I should have been doing all along. Thanks! (In my defence, I must say I've only done it in Linux in cases where a leak would be a security risk, and I thought security would override the other (including AG'149) considerations. Nevertheless, you've pointed out a conformant, and much faster, option!) – Nominal Animal Jul 02 '14 at 03:11
  • @NominalAnimal: It's still not conforming, because, as explained in the response for issue 149, a conforming implementation can have internal-use file descriptors which a new process image (via `execve`) needs to inherit in order to provide conforming behavior (certain tracing implementations was the example given). I agree this is unlikely to matter in practice, and this approach is much better than the standard "close everything" solutions. – R.. GitHub STOP HELPING ICE Jul 02 '14 at 03:56