Output redirection of real-time stdout doesn't work after piping it into uniq or awk

Question

I'm trying to run

fswatch -tr /home/*/*/public_html | grep --line-buffered -E ".php|.xml" | awk '!seen[$0]++' >> log.txt

or equivalently (by using uniq):

stdbuf -i0 -o0 -e0 fswatch -tr /home/*/*/public_html | grep --line-buffered -E ".php|.xml" | uniq >> log.txt

So that I don't get duplicate rows. It works just fine in the terminal, with standard output, however when I'm trying to write that output to log.txt, the file is blank (or no new rows are inserted if using >>).

fswatch is a command that "monitors" changes to the filesystem in real time, and it generates a lot of duplicate events and uniq seems to address that just fine.

Any ideas why the output redirection doesn't work?

Even though `grep` is line buffered, unless you have an `awk` that has that same option its output will be fully buffered. Similarly for `uniq`. — William Pursell, Jan 11 '22 at 21:37

score 2 · Answer 1 · answered Jan 11 '22 at 21:41

2

awk and uniq are going to buffer their output when writing to a regular file. You can get unbuffered behavior with perl:

... | perl -ne '$|=1; print unless ${$_}++'

That is the perl equivalent of awk '!seen[$0]++', but setting $| non-zero makes the output unbuffered. To be more correct you should probably write BEGIN{$|=1} so you're not making the assignment on every line of input, but it's not really necessary.

answered Jan 11 '22 at 21:41

William Pursell

204,365
48
270
300

You're right, it's due to the buffering of the text processing command. BTW, keeping almost all the lines in memory isn't a good idea – Fravadona Jan 11 '22 at 22:30
*BTW, keeping almost all the lines in memory isn't a good idea* It would be trivial to replace `${$_}++` with something like `${md5($_)}` which is a fraction of the footprint... – dawg Jan 12 '22 at 01:13
@dawg I didn't think of using a hash, that would save a lot of RAM. The main problem remains though: the consumption is infinite because the lines are all timestamped – Fravadona Jan 12 '22 at 07:56

score 0 · Answer 2 · answered Jan 12 '22 at 05:03

0

formatting didn't look right in comment, so simply re-pasting it for clarity :

mawk '!__[$_]--{ print; fflush() }'

answered Jan 12 '22 at 05:03

RARE Kpop Manifesto

2,453
3
11

Output redirection of real-time stdout doesn't work after piping it into uniq or awk

2 Answers2