16

I have a program in which I load text from a file and then filter it according to one of the fields. What I am interested in is the size of the data after this filtering step.

Ideally, I would be able to do something like: awk '$2>=10' <myfile | du -

I could just apply the filter and save the output somewhere, call du on it, and then delete the file, but the file is quite large, so writing to disk could take a while.

Recognizing that du stands for "disk usage", I suspect I am asking something that makes no sense, given how the program works. If there is another common utility that will do this, please suggest it!

reo katoa
  • 5,751
  • 1
  • 18
  • 30

3 Answers3

23

You can pipe it to wc -c to count the number of bytes that goes through the pipeline.

Brian Campbell
  • 322,767
  • 57
  • 360
  • 340
  • 1
    Piping `wc`'s output to `awk '{print $1/1000"K"}'` gives nice, human-readable output. For example: `cat ~/.bashrc | wc -c | awk '{print $1/1000"K"}'`. More info [at the Unix Stack Exchange.](https://unix.stackexchange.com/questions/206733/how-can-i-get-the-size-of-stdin/206734#206734) – GDP2 Jun 12 '18 at 20:01
5

du stands for "disk usage". Data in a pipe doesn't hit the disk, so there's no "du" to work with. use wc instead, which is "word count".

awk '$2>=10' < myfile | wc -c

The -c flag counts bytes.

Dag Høidahl
  • 7,873
  • 8
  • 53
  • 66
Marc B
  • 356,200
  • 43
  • 426
  • 500
0

In zsh, you can:

du -h =(cat myfile)

The output in =(...) will be saved in a temporary file, and the path of that file will replace =(...). The file will be deleted after the completion of the command.

HappyFace
  • 3,439
  • 2
  • 24
  • 43