3

I have a large amount of text coming in stdout and stderr; I would like to log all of it in a file (in the same order), and print only what comes from stdout in the console for further processing (like grep).

Any combination of > file or &> file, even with | or |& will permanently redirect the stream and I cannot pipe it afterwards:

my_command > output.log | grep something  # logs only stdout, prints only stderr
my_command &> output.log | grep something  # logs everything in correct order, prints nothing
my_command > output.log |& grep something  # logs everything in correct order, prints nothing
my_command &> output.log |& grep something  # logs everything in correct order, prints nothing

Any use of tee will either

  • print what comes from stderr then log everything that comes from stdout and print it out, so I lose the order of the text that comes in
  • log both in the correct order if I use |& tee but I lose control over the streams since now everything is in stdout.

example:

my_command | tee output.log | grep something  # logs only stdout, prints all of stderr then all of stdout
my_command |& tee output.log | grep something  # logs everything, prints everything to stdout
my_command | tee output.log 3>&1 1>&2 2>&3 | tee -a output.log | grep something  # logs only stdout, prints all of stderr then all of stdout

Now I'm all out of ideas.

This is what my test case looks like:

testFunction() { 
  echo "output"; 
  1>&2 echo "error"; 
  echo "output-2"; 
  1>&2 echo "error-2"; 
  echo "output-3"; 
  1>&2 echo "error-3";
}

I would like my console output to look like:

output
output-2
output-3

And my output.log file to look like:

output
error
output-2
error-2
output-3
error-3

For more details, I'm filtering the output of mvn clean install with grep to only keep minimal information in the terminal, but I also would like to have a full log somewhere in case I need to investigate a stack trace or something. The java test logs are sent to stderr so I choose to discard it in my console output.

Alex M
  • 885
  • 9
  • 12
  • OS-level buffering means there isn't really a way to establish a particular order on different filehandles. Maybe run a wrapper which reads both streams unbuffered in a tight loop. – tripleee Sep 05 '18 at 12:09
  • It's possible to reconstruct original ordering, but you need something like `sysdig` (dumping your syscalls into a ring buffer before execution) to have any guarantee of correctness. – Charles Duffy Sep 05 '18 at 15:58
  • 1
    (I wouldn't even describe it as "OS-level buffering" that's the problem -- even if you had no buffering at all, `tee` takes time to execute that a write direct to a file doesn't, letting one FD writing to a FIFO to `tee` and another writing straight to a file get out of sync; similarly, programs' scheduling can be delayed at will, so there's no guarantees that two writes to FIFOs connected to distinct copies of `tee` will result in those copies being scheduled to run in the same order that the original writes occurred). – Charles Duffy Sep 05 '18 at 16:25
  • The [related question](https://stackoverflow.com/questions/45760692/separately-redirecting-and-recombining-stderr-stdout-without-losing-ordering) specifies _recombining_ `stdout` and `stderr`, but my problem is more about _filtering_ them. Is there not a `tee`-equivalent but which reads from both streams and writes to both streams? Then I can simply filter that. – Alex M Sep 05 '18 at 16:51
  • The very fact of dividing into two streams means there's no longer an OS-level interface (without getting into `ptrace`, `sysdig`, `dtrace` on MacOS/BSD, etc) that lets you determine with precision what order a pair of writes to the two distinct streams happened in. Whether you're filtering or combining is quite beside the point -- the point is that if you want to treat the streams distinctly, but to retain ordering between writes made to them, you need to use an approach akin to that given in the linked question's answer. – Charles Duffy Sep 05 '18 at 17:00
  • ...a "tee equivalent" isn't possible -- it doesn't matter whether it's one program handling both streams or two instances; what matters is that you don't have a guarantee provided by the operating system that the `write()`s to your FDs will be delivered as `read()`s to the program on the other end in the order that they occurred when they're over two separate sockets. If what you were asking for *were* possible, the `annotate-output` tool suggested by kvantour wouldn't have the caveats it does. – Charles Duffy Sep 05 '18 at 17:06
  • Yes, I just realized that my question actually precisely involved _recombining_ the streams... I did not suspect such a simple problem to be blocked by such low-level obstacles. Thanks for linking a workaround! – Alex M Sep 06 '18 at 08:38

1 Answers1

2

While not really a solution which uses redirects or anything of that order, you might want to use annotate-output for this.

Assume that script.sh contains your function, then you can do:

$ annotate-output ./script.sh                                                     
13:17:15 I: Started ./script.sh
13:17:15 O: output
13:17:15 E: error
13:17:15 E: error-2
13:17:15 O: output-2
13:17:15 E: error-3
13:17:15 O: output-3
13:17:15 I: Finished with exitcode 0

So now it is easy to reprocess that information and send it to the files you want:

$ annotate-output ./script.sh \
  | awk '{s=substr($0,13)}/ [OE]: /{print s> "logfile"}/ O: /{print s}'
output
output-2
output-3
$ cat logfile 
output
error
error-2
output-2
error-3
output-3

Or any other combination of tee sed cut ...

As per comment from @CharlesDuffy:

Since stdout and stderr are processed in parallel, it can happen that some lines received on stdout will show up before later-printed stderr lines (and vice-versa).

This is unfortunately very hard to fix with the current annotation strategy. A fix would involve switching to PTRACE'ing the process. Giving nice a (much) higher priority over the executed program could, however, cause this behaviour to show up less frequently.

source: man annotate-output

kvantour
  • 25,269
  • 4
  • 47
  • 72
  • Unless this is implemented to redirect syscalls as with `ptrace()`, to use an `LD_PRELOAD` override to override C standard-library write operations, or otherwise does monkey-in-the-middle hackery, it **cannot** guarantee that it retains original ordering. – Charles Duffy Sep 05 '18 at 15:55
  • ...indeed, the `BUGS` section of the linked man page says exactly that. – Charles Duffy Sep 05 '18 at 15:57
  • @CharlesDuffy Just added the man page section – kvantour Sep 05 '18 at 16:00
  • @CharlesDuffy, maybe adding the nanoseconds to the time (`+%s.%N`), might work and then using a sort. – kvantour Sep 05 '18 at 16:04
  • The time it's read by `awk` is not the time the `write()` syscall is invoked, which is the one that matters for synchronization. Writes don't *have* a well-defined ordering at all when they aren't to two copies of the same file descriptor (or, of course, the same copy), so the ordering you want *doesn't exist* in any OS-level guarantees. – Charles Duffy Sep 05 '18 at 16:06
  • 1
    Trying to add precision on the other side of a pair of separate/distinct FIFOs is trying to create information out of the ether -- sometimes two reads will happen in opposite order from the paired writes, and recording that potentially-reversed order with greater precision does nothing to solve the underlying issue. – Charles Duffy Sep 05 '18 at 16:10
  • @CharlesDuffy, just noticed that `annotate-output` is nothing more than a bash-script. I agree with your comments and especially the PS in [your answer](https://stackoverflow.com/a/45761769/8344060). However, for simple logging purposes where the world will not end when the order is not 100% guaranteed, it remains to be a useful tool. This, however, might not be the case if the program spits rapidly errors and output out. – kvantour Sep 05 '18 at 16:16