So a naive me wanted to parse 50 files using awk
, so I did the following
zcat dir_with_50files/* > huge_file
cat huge_file | awk '{parsing}'
Of course, this was terrible because it would spend time creating a file, then consume a whole bunch of memory to pass along to awk
.
Then a coworker showed me that I could do this.
zcat dir_with_50files/filename{0..50} | awk '{parsing}'
I was amazed that I would get the same results without the memory consumption.
ps aux
also showed that the two commands ran in parallel. I was confused about what was happening and this SO answer partially answered my question.
https://stackoverflow.com/a/1072251/6719378
But if piping knows to initiate the second command after certain amount of buffered data, why does my naive approach consume so much more memory compared to the second approach?
Is it because I am using cat
on a single file compared to loading multiple files?