1

I'm testing random number generator and I need to pass it's output to various tests. Since RNG is relatively slow compared to tests and I need to test 0.5-1TB of data I came up with idea to use tee to pass the data from RNG to all the tests. The main benefit is that I need only to generate the data ones. The command is

./RNG | tee >(test1) >(test2) >(test3) >/dev/null

However, it does not work as expected. When for example test1 finishes, tee will stops all other test even when they need more data to finish.

You can see the problem with command:

cat /dev/zero | tee >(head -c200M | md5sum) >(head -c10M | sha1sum) | wc -c

Output is: 10559568

I would expect that tee will finish after all child process will finish but it's not the case. It will stop after first process will finish (in this case head -c10M | sha1sum). What can I do to change this behaviour?

4ae1e1
  • 7,228
  • 8
  • 44
  • 77
Jirka
  • 365
  • 2
  • 8
  • I think this is related to the problem described in [this question](http://stackoverflow.com/questions/33753281/bash-anonymous-pipes). – chepner Nov 18 '15 at 19:16
  • 1
    `tee` closes the pipe when one of the writing ends closes the pipe (and hence getting `SIGPIPE`), which in turn results in `SIGPIPE` on `RNG`'s end. You won't have this problem as long as the pipe is not closed on the writing ends. This can be easily achieved by a `; cat >/dev/null`, for instance. Try `cat /dev/zero | tee >(head -c200M | sha1sum) >(head -c10M | sha1sum; cat >/dev/null) | wc -c` and you'll get `209781760`. – 4ae1e1 Nov 18 '15 at 19:20
  • The proposed WA works only as long it's known which test will consume most of the data. What I really want is `tee` to continue as long at least one pipe is open and to avoid output to stdout as well. I was able to accomplish this by editing `tee` source code. It was easy - I have added `signal (SIGPIPE, SIG_IGN); ` and commented out following code `if (fail) { error (output_error == output_error_exit || output_error == output_error_exit_nopipe, w_errno, "%s", files[i]); }` as well as opening of stdout for writing. I will ask `tee` developers to add a command line switch for this. – Jirka Nov 18 '15 at 21:52
  • BTW, I have now better testcase to illustrate the problem: `cat /dev/zero | tee >(head -c200M | wc -c ) >(head -c1 | wc -c) >/dev/null` Output is`1 73728` – Jirka Nov 18 '15 at 21:53
  • It's almost worth trying an unmodified `tee` launched with SIGPIPE ignored: `./RNG | (trap '' PIPE; tee >(test1) >(test2) >(test3) >/dev/null)`. However, the version I've got continues to attempt writing to the failed descriptors, and it reports a message each time it fails. That's pretty graceless; it might legitimately warn once, and should stop attempting to write to a descriptor once it has failed. There's a 'process tee' (aka `pee`) lurking around — you could use that. I wrote a `tpipe` that does the same job, and gracefully stops writing to a descriptor once it gives an error. – Jonathan Leffler Nov 19 '15 at 01:16

1 Answers1

0

Some versions of tee already support --output-error=warn which means output errors are diagnosed via STDERR, but tee doesn't quit on errors writing to pipes. Unfortunately, tee on my laptop is very basic, so I cannot check if this option fulfills your test case.

You will probably need to replace your final >/dev/null with the longest testcase, or with something like >(head -c1000G), otherwise tee will keep going forever piping data to /dev/null.

Dmitry Grigoryev
  • 3,156
  • 1
  • 25
  • 53