3

I'm trying to run

fswatch -tr /home/*/*/public_html | grep --line-buffered -E ".php|.xml" | awk '!seen[$0]++' >> log.txt

or equivalently (by using uniq):

stdbuf -i0 -o0 -e0 fswatch -tr /home/*/*/public_html | grep --line-buffered -E ".php|.xml" | uniq >> log.txt

So that I don't get duplicate rows. It works just fine in the terminal, with standard output, however when I'm trying to write that output to log.txt, the file is blank (or no new rows are inserted if using >>).

fswatch is a command that "monitors" changes to the filesystem in real time, and it generates a lot of duplicate events and uniq seems to address that just fine.

Any ideas why the output redirection doesn't work?

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55

2 Answers2

2

awk and uniq are going to buffer their output when writing to a regular file. You can get unbuffered behavior with perl:

... | perl -ne '$|=1; print unless ${$_}++'

That is the perl equivalent of awk '!seen[$0]++', but setting $| non-zero makes the output unbuffered. To be more correct you should probably write BEGIN{$|=1} so you're not making the assignment on every line of input, but it's not really necessary.

William Pursell
  • 204,365
  • 48
  • 270
  • 300
  • You're right, it's due to the buffering of the text processing command. BTW, keeping almost all the lines in memory isn't a good idea – Fravadona Jan 11 '22 at 22:30
  • *BTW, keeping almost all the lines in memory isn't a good idea* It would be trivial to replace `${$_}++` with something like `${md5($_)}` which is a fraction of the footprint... – dawg Jan 12 '22 at 01:13
  • @dawg I didn't think of using a hash, that would save a lot of RAM. The main problem remains though: the consumption is infinite because the lines are all timestamped – Fravadona Jan 12 '22 at 07:56
0

formatting didn't look right in comment, so simply re-pasting it for clarity :

mawk '!__[$_]--{ print; fflush() }'
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11