2

In general, once the handle to the file is open, the file is open, and nobody changing the directory structure can change that - the file can be moved, renamed, or put something else in its place - it remains open by construction, as in Linux/Unix there is no real delete for files, but only unlink, which doesn't necessarily delete the file - it just removes the link from the directory. Result: The file handle will stay valid whatever happens to the file.

However, if the underlying device disappears (e.g. the file is on a USB stick that is removed from the system) then the file won't be accessible any longer.

I have a program that opens a huge binary file (> 4 GB) of some other application at start. Afterwards, it watches the file for changes, by querying

long int pos = fseek(filepointer, 0L, SEEK_END);

quite often (every few milliseconds) and reverts to the previous positions, if the result is different from pos_before. In this case, fgets is used to read the new data from the file.

Hence, only the tail of the file is scanned for changes, making the whole process fairly lightweight. However, it is affected by the potential problem that the always opened file pointer may become invalid if the file system is changed (see above).

The code does not need to be portable to any non-Linux/Unix systems.

Question:

  • How can I detect if the file pointer is still valid after having opened the file successfully (this event may be weeks ago)? I have seen that one might be able to use fcntl(fileno(filepointer), F_GETFD) for testing.

Alternative question:

  • Would it be feasible to detect changes in the file size in an alternative way? I could think of using periodically
    • fseek(filepointer, 0L, SEEK_END); (might be very slow and cause a lot of I/O), or
    • _filelength(fileno(filepointer)); (unclear if this will cause lots of I/O)
    • stat(filename, &st); st.st_size; (unclear if this will cause any I/O)
MrD
  • 475
  • 6
  • 24
  • 1
    I don't have Linux, but in Windows you cannot delete or move a file that has been opened by another process and I would be surprised if this isn't also the case with 'nix systems. As for removing the media where an open file is, this is one reason why you should always check the result of I/O operations. (And ask the system for permission to remove the media). – Weather Vane Apr 09 '17 at 12:25
  • @WilliamPursell I did not say that, and you can (for example defragmenting a volume). I said you cannot delete file that is open, for very obvious reasons. – Weather Vane Apr 09 '17 at 12:34
  • @WeatherVane What is the difference between "mov[ing] a file" and changing names? – William Pursell Apr 09 '17 at 12:35
  • Did you check the answer of George Carrette in http://stackoverflow.com/questions/551069/testing-pointers-for-validity-c-c . In the end it recommends `stat()`. – benni Apr 09 '17 at 12:37
  • It's not at all obvious to me why you can't delete a file when a process has that file open. Does windows not support the abstraction of links? Is there only one name for each file? – William Pursell Apr 09 '17 at 12:38
  • @WilliamPursell I am not going to enter the "Linux good, Windows bad" debate (or "gcc C extensions good, MSVC C extensions bad.") – Weather Vane Apr 09 '17 at 12:38
  • 1
    I'm not trying to argue "Windows bad". I'm trying to understand a system that I have little experience with. Is there only one name for each file? – William Pursell Apr 09 '17 at 12:39
  • I have never considered the confusing possiblity that there could be, although "shortcuts" may have another name. – Weather Vane Apr 09 '17 at 12:40
  • So changing the name of a file requires actually moving data on the underlying medium? – William Pursell Apr 09 '17 at 12:41
  • @WeatherVane (and @William), look up the concept of [hard links](https://en.wikipedia.org/wiki/Hard_link). On the kernel level, current Windows systems, and NTFS as a file system support some features Unix-like features that aren't really used much in practice so it's not even a Win-vs-Linux thing. – ilkkachu Apr 09 '17 at 12:42
  • Why do you think `fseek` might be very slow? It basically does a stat and set the offset to the length of the file. – Jean-Baptiste Yunès Apr 10 '17 at 05:11
  • @Jean-BaptisteYunès I was assuming that it iterates through the file until it finds its end. However, looking up the implementation, I see that it uses `lseek` and I might have mixed that up with `fgets` (which actively has to search for a newline character). – MrD Apr 10 '17 at 09:32
  • File length is one of the meta-data and doesn't necessitate to read all the file. – Jean-Baptiste Yunès Apr 10 '17 at 13:24

2 Answers2

3

Well, usually an open file will prevent unmounting the filesystem, so it shouldn't just disappear under you. Though with USB disks etc, there is of course the possibility of the user pulling the device without asking the system.

But it would be nice of the process to not prevent clean unmounting. That requires two things:

  1. Don't keep the file open
  2. Don't keep the containing directory as the processes working directory.

Running stat(2) periodically on the path would be the way to do this. You can detect changes to the file from modifications to mtime, ctime, the file size. Errors and changes in the inode number or the containing device (st_dev) might indicate the file is no longer accessible or isn't the same file any more. React depending on application requirements.

(That is, assuming you're interested in the file currently pointed to by that name, and not in the same inode you opened.)

As for I/O, it's likely that periodically stating something would keep the inode cached in memory, so the issue would be more about memory use than I/O. (Until you do this on enough to files to not be able to cache them, which leads to trashing, an issue of both memory and I/O...) Seeking to the end of the file would also similarly require loading the length of the file, I can't see why that would cause any significant I/O.

Another choice would be to use inotify(7) on the file or the whole directory to detect changes without polling. It can also detect unmount events.

ilkkachu
  • 6,221
  • 16
  • 30
  • The particular issue in my case is that the open file (depending on the installation of the embedded system) may be placed in a ramdisk which sees a hard-reset after a certain time. This is neither friendly nor controllable by my application. I already experimented with `inotify` but it gave me a high CPU load on some `arm` targets, so I abandoned it. I'm only concerned with one single file which can become huge in size (a log file in /var/log). I'll try using `stat`. – MrD Apr 09 '17 at 13:59
2

How can I detect if the file pointer is still valid after having opened the file successfully

If the FILE* was not explicitly fclose()d by the process that opened (or inherited) it and if the process in question did not invoke Undefined Behaviour the FILE* is valid by definition.

In case any underlying layer cannot fulfill requests issued by the FILE* fp's process (typically raised indirectly via calls into LIBC like fread(), fwrite()or fseek(), or directly by doing for example read(fileno(fp))) the failing functions should return indicating an error condition and set errno accordingly, this typically would be EIO.

Just implement complete error checking and handling and you won't run into any issues.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
alk
  • 69,737
  • 10
  • 105
  • 255