-1

Here's the self-explanatory one-liner I want to execute:

for f in *; do awk '{sub(FILENAME, FILENAME".")1}' $f > $f; done

This command does not work as wanted. The output files are all empty. I have searched the internet for the reason that happens, and it turns out loops in Bash are considered one single command, thus stream redirection is expected to be outside of it, after "done"

I then tried this, and the result is even more surprising:

for f in *; do awk '{sub(FILENAME, FILENAME".")1}' $f | tee $f; done

So now does not work too, except sometimes it does for one of the files in the directory, not the same. I copy over fresh copies of the files in the directory (that I have backed up somewhere else), I run that one-liner, and file B is modified as expected (others become empty). Then I recopy over fresh copies, rerun the command, then it's file C that gets modified as expected (others still empty). And some other times, it won't work for even one file.

  1. Can you please tell me how I can achieve the desired result?
  2. What is happening with that second command?
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
user9128
  • 3
  • 2
  • please describe in english what you're attempting to do for each file; what do you think `sub(FILENAME,FILENAME".")1` is supposed to do? both `$f > $f` and `$f | tee $f` are reading/writing-over the same file and as socowi's pointed out ... 'tis a recipe for disaster – markp-fuso Aug 06 '21 at 18:27
  • 2
    The reason the files are empty is this: bash processes redirections **before** launching the command. The `>` redirection truncates the file to zero size **before** awk is invoked, and it has an empty file to read from. – glenn jackman Aug 06 '21 at 18:31
  • See ["Why does `sort file > file` result in an empty file?"](https://stackoverflow.com/questions/30841387/why-does-sort-file-file-result-in-an-empty-file), ["How can I use a file in a command and redirect output to the same file without truncating it?"](https://stackoverflow.com/questions/6696842/how-can-i-use-a-file-in-a-command-and-redirect-output-to-the-same-file-without-t), and ["awk printing nothing when used in loop"](https://stackoverflow.com/questions/62330138/awk-printing-nothing-when-used-in-loop) – Gordon Davisson Aug 06 '21 at 19:02

1 Answers1

1

Reading from a file while simultaneously overwriting it is a recipe for disaster. But with cmd $f > $f bash empties (= truncates) the file before cmd even runs. cmd $f | tee $f may work for short files because cmd and tee run in parallel and the output of cmd is buffered. If you are lucky, your system executes cmd's read operations before tee's truncate operation. The bigger the file, the less chance you have of reading all data before tee truncates it.

If you want to see this race condition between cmd's read operation and tee's truncate operation yourself, have a look at

head -c1M /dev/zero > f; LC_ALL=C strace -f -e execve,openat,read,write bash -c 'cat f | tee f' >/dev/null; wc -c f

My tee implementation from GNU coreutils 8.32 truncates the file by calling openat(… "f" … O_TRUNC …). After that operation succeeded, cat's next read will return = 0, signaling the end of file.

In your case, there are three possible solutions:

  • Use a temporary file which you rename afterwards
    awk ... "$f" > "$f.tmp"; mv "$f.tmp" "$f"
  • Use GNU awk's inplace option
    gawk -i inplace ... "$f"
  • Use sponge from GNU moreutils
    awk ... "$f" | sponge "$f"
Socowi
  • 25,550
  • 3
  • 32
  • 54
  • Thank you for your answer, that's what I was looking for. But then, I understand why `tee` wouldn't work as expected, but can you please explain why the files are empty (most of the time) when using `tee`? – user9128 Aug 07 '21 at 01:22
  • @user9128 I added more explanations. Whether or not the file is empty depends on how your system schedules the operations of programs running in parallel. `tee` is a simple program, so the operation that truncates the file comes very early; whereas `awk` does a lot of stuff before reading the file. Therefore, the chances of executing `awk`'s read before `tee`'s truncate are pretty slim. You can improve the chance by slowing `tee` down. `awk ... $f | { sleep 1; tee $f; }` will give you a near 100% chance that `awk` reads at least one buffer of `$f`s content before `tee` truncates `$f`. – Socowi Aug 07 '21 at 12:04