11

We have a black-box third-party Java program that takes input files from a location and makes PDFs. It puts a manifest file in the same location every time for each input which necessitates us feeding the file in a controlled manner. Does the manifest (or .xen/.que) still exist? Don't feed an input file.

We're getting VERY rare (one out of tens of thousands of files) instances of our feed script not finding anything, feeding a file, and the resulting error when the manifest is overwritten and things don't match up. I wrote a perl script that does nothing but print the time down to 100-thousandths, glob anything in the directory that we care about, and print it. Below you can see .xen and .que files where .xen is the input and .que is a renamed version of it to indicate processing.

My question then is this: How is the lack of files at 94.26493 possible? Does the OS hide a file while it is renaming? We're getting our problem when the feed program looks for files at that moment so my planned hack is to check for files twice; hopefully slow enough to catch either end of the rename. I should also point out that once 2 files show up on a line, that is where the feed program has put another file in. It is not the same file as before the rename.

1421417394.26392/gpfs/fsdd/projects/corr_esch/corr_esch.d.xen
1421417394.26416/gpfs/fsdd/projects/corr_esch/corr_esch.d.xen
1421417394.26442/gpfs/fsdd/projects/corr_esch/corr_esch.d.xen
1421417394.26468/gpfs/fsdd/projects/corr_esch/corr_esch.d.xen
1421417394.26493
1421417394.26907/gpfs/fsdd/projects/corr_esch/corr_esch.d.xen.que_142_1421417394265
1421417394.27426/gpfs/fsdd/projects/corr_esch/corr_esch.d.xen /gpfs/fsdd/projects/corr_esch/corr_esch.d.xen.que_142_1421417394265
1421417394.27456/gpfs/fsdd/projects/corr_esch/corr_esch.d.xen /gpfs/fsdd/projects/corr_esch/corr_esch.d.xen.que_142_1421417394265
1421417394.27486/gpfs/fsdd/projects/corr_esch/corr_esch.d.xen /gpfs/fsdd/projects/corr_esch/corr_esch.d.xen.que_142_1421417394265
1421417394.27528/gpfs/fsdd/projects/corr_esch/corr_esch.d.xen /gpfs/fsdd/projects/corr_esch/corr_esch.d.xen.que_142_1421417394265
pete1450
  • 135
  • 5
  • 1
    As far as I am aware, [rename is an atomic operation on POSIX systems](http://stackoverflow.com/questions/167414/is-an-atomic-file-rename-with-overwrite-possible-on-windows), so I would think that the file would "exist" both before and after the rename operation. Windows file operations are *not* generally atomic unless [Transactional NTFS](http://msdn.microsoft.com/en-us/library/windows/desktop/bb968806%28v=vs.85%29.aspx) is used, but since this question is tagged as "unix" I'm guessing that isn't relevant. – GoBusto Jan 16 '15 at 15:10
  • Correct, specifically AIX – pete1450 Jan 16 '15 at 15:14
  • Can you include the relevant portions of the perl program? – Ben Grimm Jan 16 '15 at 15:48
  • The feeder script or the one spitting out files that currently exist? – pete1450 Jan 16 '15 at 16:16

1 Answers1

9

The actual guarantee in POSIX is that if you rename a to b and b already exists, there will be no point in time during the rename when b does not exist. It will refer either to the previously existing b or the new b formerly called a.

If b does not already exist (which appears to be the case in your example), then the guarantee doesn't apply. It is possible that there's a moment when neither a nor b exists (it depends on how the particular filesystem works). It's also possible that there's a moment when both a and b exist (and refer to the same file).

Your proposed solution of checking twice with a short delay is probably the simplest approach.

cjm
  • 61,471
  • 9
  • 126
  • 175
  • I was hoping I was wrong. I read through the doc you provided and come to the same conclusion. Thanks for the confirmation. – pete1450 Jan 16 '15 at 16:45
  • 1
    It is atomic if it is the same file system (same mount point). If you are crossing file systems, then I would assume that there is a length of time when both exist but that might vary between systems and implementation. Also, all of the comments actually pertain to the rename system call which Java may or may not be choosing to use. And I also see "gpfs" in the path which is a remote file system so I would suggest all bets are off. :-) – pedz Jan 17 '15 at 20:31
  • @pedz, `rename` doesn't necessarily work across filesystems (there's an `EXDEV` error defined for OS's that don't support that). Also, the guarantee never did say anything about when `a` does or does not exist, only that `b` will always exist. – cjm Jan 18 '15 at 23:12
  • Yes. for the rename system call, that's true. But I thought the user was actually using Java. And the mv command on AIX uses rename if it can but does not if it is cross file systems. And, on top of that, he is actually doing this on a remote file system so really it is going to be gpfs that defines how this works. – pedz Jan 18 '15 at 23:54
  • The blackbox program I mentioned is Java, so I can't guarantee how it does the rename. A plain old rename is just my best guess. You are correct about gpfs being the remote filesystem as well. I hadn't thought about the rename logic being decided there. – pete1450 Jan 20 '15 at 17:39