1

I'm writing a file that takes minutes to write. External software monitors for this file to appear, but unfortunately doesn't monitor for inotify IN_CLOSE_WRITE events, but rather checks periodically "the file is there" and then starts to process it, which will fail if the file is incomplete. I cannot fix the external software. A workaround I've been using so far is to write a temporary file and then rename it when it's finished, but this workaround complicates my workflow for reasons beyond the scope of this question¹.

Files are not directory entries. Using hardlinks, there can be multiple pointers to the same file. When I open a file for writing, both the inode and the directory entry are created immediately. Can I prevent this? Can I postpone the creation of the directory entry until the file is closed, rather than when the file is opened for writing?

Example Python-code, but the question is not specific to Python:

fp = open(dest, 'w')  # currently both inode and directory entry are created here
fp.write(...)
fp.write(...)
fp.write(...)
fp.close()  # I would like to create the directory entry only here

Reading everything into memory and then writing it all in one go is not a good solution, because writing will still take time and the file might not fit into memory.

I found the related question Is it possible to create an unlinked file on a selected filesystem?, but I would want to first create an anonymous/unnamed file, then naming it when I'm done writing (I agree with the answer there that creating an inode is unavoidable, but that's fine; I just want to postpone naming it).

Tagging this as , because I suspect the answer might be different between Linux and Windows and I only need a solution on Linux.


¹Many files are produced in parallel within dask graphs, and injecting a "move as soon as finished" task in our system would be complicated, so we're really renaming 50 files when 50 files have been written, which causes delays.

gerrit
  • 24,025
  • 17
  • 97
  • 170
  • *but unfortunately doesn't monitor for inotify `IN_CLOSE_WRITE` events* Be glad it doesn't - that's **unreliable** because there's no actual indication that the file is complete. Getting notified that file is closed is meaningless - you have no indication of **why** it was closed. – Andrew Henle Nov 04 '22 at 12:29
  • *Many files are produced in parallel within dask graphs, and injecting a "move as soon as finished" task in our system would be complicated, so we're really renaming 50 files when 50 files have been written, which causes delays.* That's what you get for misusing a state-based data store such as a filesystem as an event-based messaging channel. – Andrew Henle Nov 04 '22 at 12:31
  • @AndrewHenle Right, perhaps ideally the code running inside dask should send a message "file successfully written". The solution I'm exploring in my answer is not ideal. I'm currently in the phase of exploring what's possible, then in a next step I can weigh the pros and cons of the different alternatives. Even if what I ask for in this question is possible, I might not end up using it at all. – gerrit Nov 04 '22 at 12:38
  • https://stackoverflow.com/a/4174216/10678955 – root Nov 06 '22 at 07:17

0 Answers0