21

On my system (Ubuntu Linux, glibc), man page of a close call specifies several error return values it can return. It also says

Not checking the return value of close() is a common but nevertheless serious programming error.

and at the same time

Note that the return value should only be used for diagnostics. In particular close() should not be retried after an EINTR since this may cause a reused descriptor from another thread to be closed.

So I am not allowed to ignore the return value nor to retry the call.

Given that, how shall I handle the close() call failure?

If the error happened when I was writing something to the file, I am probably supposed to try to write the information somewhere else to avoid the data loss.

If I was only reading the file, can I just log the failure and continue the program pretending nothing happened? Are there any caveats, leak of file descriptors or whatever else?

Ilya Popov
  • 3,765
  • 1
  • 17
  • 30
  • Thought about this too. (http://unix.stackexchange.com/questions/231677/failing-close-system-call) Close failures make sense in certain cases (e.g., faulty disk syncs) but I think it should be safe to assume close won't fail in some other cases. Like closing an instance of a duplicated filedescriptor which isn't the last instance pointing to the same physical file or closing a pipe, because those would be basically kernel bugs, but I would love to hear a more enlightened answer. – Petr Skocik Oct 13 '15 at 23:25
  • 1
    FWIW, Raymond Chen's take on this general type of situation: http://blogs.msdn.com/b/oldnewthing/archive/2008/01/07/7011066.aspx – Michael Burr Oct 13 '15 at 23:36
  • Whatever you do, *always let the user know*. Just "logging" it into some internal log file nobody ever looks at is not enough; you'll want the user to know that something hinky is happening. For GUI applications, I'd pop up a modal dialog box. For command line applications, I'd print a warning to standard error. For services, the log file suffices. If `close()` error happens after writing to a file, I'd abort exactly the same way I would if I encountered a write error during writing to the file. – Nominal Animal Oct 14 '15 at 11:51

2 Answers2

13

In practice, close should never be retried on error, and the fd you passed to close is always invalid (closed) after close returns, regardless of whether an error occurred. In some cases, an error may indicate that data was lost (certain NFS setups) or unusual hardware conditions for devices (e.g. tape could not be rewound), so you may want to be cautious to avoid data loss, but you should never attempt to close the fd again.

In theory, POSIX was unclear in the past as to whether the fd remains open when close fails with EINTR, and systems disagreed. Since it's important to know the state (otherwise you have either fd leaks or double-close bugs which are extremely dangerous in multithreaded programs), the resolution to Austin Group issue #529 specified the behavior strictly for future versions of POSIX, that EINTR means the fd remains open. This is the right behavior consistent with the definition of EINTR elsewhere, but Linux refuses to accept it. (FWIW there's an easy workaround for this that's possible at the libc syscall wrapper level; see glibc PR #14627.) Fortunately it never arises in practice anyway.

Some related questions you might find informative:

Community
  • 1
  • 1
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • so if on `EINTR` the fd remains open, ¿does it continue to be an error to try to close it again? What the hell should we do in that case? Closing an invalid descriptor should return `EINVAL`, so it's not a problem to try it again (despite of what manual says about other threads open file descriptors without knowledge, what should happen if other thread just opens a descriptor in case you are in the middle of a IO redirection? ---this is a two phase procedure) Hmmm... arent we trying to force too much the machine? – Luis Colorado Oct 15 '15 at 04:57
  • @LuisColorado: Per the (amended) standard, on `EINTR` the fd remains open. However, Linux does not honor this and glibc does not work around the failure to honor it. See the links in my answer. Fortunately `EINTR` does not happen on `close` on Linux in any real-world situations I'm aware of, anyway. – R.. GitHub STOP HELPING ICE Oct 15 '15 at 04:59
  • `EINTR` can happen on `close(2)` on NFS filesystems, network connections (well, on networking TCP/IP sockets the kernel does actually the work, but not sure on other protocols), as well as on every device that needs handshaking to be closed (in the last close thing, depending on the device driver to return from close) And linux is not the *only* POSIX system that exist. – Luis Colorado Oct 15 '15 at 05:10
  • 1
    @LuisColorado: At least on Linux, the `release` file op cannot cause `EINTR` (its return value is ignored), but `flush` can. See the question I linked. This probably prevents it from happening in practice, but I may be mistaken. I agree there are other systems to worry about. Unless you have `posix_close` available (added for POSIX-future) the only fully-safe thing to do is to mask all interrupting signals whenever you call `close`... :-( – R.. GitHub STOP HELPING ICE Oct 15 '15 at 14:38
2

First of all: EINTR means exactly that: System call was interrupted, if this happens on a close() call, there is exactly nothing you can do.

Apart from maybe keeping track of the fact, that if the fd belonged to a file, this file is possibly corrupt, there is not much you can do about errors on close() at all - depending on the return value. AFAIK the only case, where a close can be retried is on EBUSY, but I have yet to see that.

So:

  • Not checking the result of close() might mean that you miss file corruption, especially truncation.
  • Depending on the error, most of the time you can do nothing - a failed close() just means something has gone awfully wrong outside the scope of your application.
Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
  • `EINTR` means the syscall was interrupted and will not be retried, so it was not executed at all. The system did not execute, so the file descriptor is not closed, and must be closed. What about the atomicity of system calls? If the system call was not performed and we cannot close the descriptor again, what about repeating this process a bunch of times with a leak of one descriptor each? Normally, the implementation of calls that block (like close(), but not for normal files) just undoes what has been done and makes a `longjmp(3)` to the saved context, just to conserve atomicity. – Luis Colorado Oct 15 '15 at 05:01
  • @LuisColorado Retrying `close()` after `EINTR` is not a good idea on Linux. It might close a different fd. – Eugen Rieck Oct 15 '15 at 09:06
  • Device driver writers are always warned about close primitive, as it must do its work indepently of the hardware conditions, returning a stable environment. This means device driver writers must sometimes to leave resources locked in memory to wait for the device to respond or mark it as unusable until some condition, but the process has disconnected time ago. Think that device drivers aren't normally written by the same guys that write the operating system. I'm not talking specifically of linux, but for posix, which means a lot of different systems. – Luis Colorado Oct 16 '15 at 05:11
  • then, `close(2)`ing and `dup(2)`ing to redirect output is not also a good idea on linux, as you can get the hole filled by other thread. I understood your message the first time. Atomicity is a hard problem indeed, but don't think you are telling the truth an I not. At this moment, nobody has written a safe thread to redirect a file descriptor and the whole system continues to work without pain. Perhaps, considering that `close(2)` is not thread safe and locking the context for the whole process will allow you to retry a `close(2)` call until you are safe. Or leak announces. – Luis Colorado Oct 16 '15 at 05:17
  • And think that if the result of `close(2)` means that you have not closed the descriptor, it's impossible for it to have moved elsewhere. You will for sure close **the same** file descriptor, because as it has not been closed, its place cannot be filled by another different descriptor (even in multithread environments) What is indeed an error is to re`close(2)` it without having checked the return code from the first, think it is not closed and redo the syscall on a possibly closed and reopened file descriptor. – Luis Colorado Oct 16 '15 at 05:21