1

In the following code:

#!/bin/bash

if [ ! -f "$file" ]
then
    stat --printf="%s" "$file"
    cat "$file"
else
    echo -1
fi

$file is the name of a binary file that could be deleted at any point.

My biggest fear is that the file could be deleted after [ ! -f "$file" ] but before cat "$file" is executed and the result would be incorrect.

But I also wonder what will happen if the file is deleted during the execution of cat "$file". Will it be fully/partially outputted, is there a risk to read unrelated characters if $file get overwritten on the drive? man cat does not explain that. Edit: https://stackoverflow.com/a/2031100/4503330

How can I guarantee that the output is either?

  • The size of the file, followed by a new line and the content of the file
  • -1

Note: the size of the file could be up to 5MiB and making a copy of it would be too slow.

Edit: The file is created with ffmpeg ... -window_size 5 -extra_window_size 0 -min_seg_duration 2000000 -f dash ... which in my case keep up to 5 files at the time in a particular directory, they never reuse the same name and they follow this cycle (entirely controlled by ffmpeg) : 1) created with .tmp extension 2) renamed without .tmp 3) (at least 10 seconds later) deleted

nd97
  • 95
  • 1
  • 8
  • 1
    Possible duplicate of [What happens to an open file handler on Linux if the pointed file gets moved, delete](https://stackoverflow.com/questions/2028874/what-happens-to-an-open-file-handler-on-linux-if-the-pointed-file-gets-moved-de) – arco444 Nov 20 '17 at 14:10
  • Please edit your question to motivate it more, and explain what you are trying to do and what other processes could practically access / modify / unlink that file – Basile Starynkevitch Nov 20 '17 at 14:26

2 Answers2

3

You cannot have that guarantee in bash (that the output is either the entire file prefixed by its size, or else -1), since, as you mentioned something can happen between the two commands (and processes).

BTW, that file might be truncated by some other process (doing ftruncate(2)...), so you cannot have any guarantee on getting the "totality" of the content.

You might consider using advisory locking (e.g. with flock(2) or lockf(3) ...; consider also flock(1) in a shell script), which works well only when all programs changing that file agree on that locking (so you need to adopt a whole system convention).

Perhaps you want to use some RDBMS server providing ACID-idity guarantees.

But I also wonder what will happen if the file is deleted during the execution of cat "$file". Will it be fully/partially outputted, is there a risk to read unrelated characters if $file get overwritten on the drive?

No. If you have some process running cat (probably the /bin/cat program, see cat(1)) that process keeps an opened file descriptor on the $file. So the data won't be released (or rewritten) as long as some opened file descriptor refers to that file.

Perhaps you could write a simple C program (which runs in the same process, in contrast to several commands in some shell script) which opens the file, uses fstat(2) (perhaps via fileno(3) if you use stdio functions) on the opened file descriptor, and loops to copy its content. That does not protect you from hostile ftruncate(2) done by other processes during that copy.

If you don't care about truncation or overwriting and only about premature rm (or unlink(2)) you might use a temporary additional hard link. Perhaps as simple as:

 newhardlink=".newhardlink$$"
 ln "$file" "$newhardlink"
 stat --printf="%s" "$newhardlink"
 cat "$newhardlink"
 rm "$newhardlink"

If you are afraid of different filesystems, the you might do

 mydir=$(dirname "$file")
 newhardlink="$mydir/.newhardlink$$"

instead of newhardlink=".newhardlink$$" and you could play trap tricks to have the final cleanup rm "$newhardlink" done in all cases.

Be also aware of inotify(7) (probably an overkill for your situation)

Better yet, change the way ffmpeg is started so that it uses some temporary file (see mktemp(1), mkstemp(3))....

Or use chepner subshell trick and in that subshell stat --printf="%s" -L /dev/stdin just before the cat

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • If it helps, I know that the file content won't change and also know the PID of the only process that could delete it. But I can't edit its source. – nd97 Nov 20 '17 at 14:24
  • But if that process does an `ftruncate` you are stuck. Explain a bit more what is that other process. – Basile Starynkevitch Nov 20 '17 at 14:26
  • In my case `ftruncate` cannot happen, I edited the question. – nd97 Nov 20 '17 at 14:35
  • Thanks, `ln` should help, but I will still have to check if it failed or not but that's a minor change – nd97 Nov 20 '17 at 14:45
2

The solution is to not check if the file exists; just try to open it, and deal with any errors in opening the file. This is easiest to do in a subshell if that is feasible:

(
    exec < foo || exit 1
    cat
)

If you actually need to use stat, it's a bit tricky. BSD stat will process the file attached to standard input if no argument is given, but GNU stat (as far as I can tell) must be given an existing file name.

chepner
  • 497,756
  • 71
  • 530
  • 681