0

I am trying to use multiple process substitutions in a BASH command but I seem to be misunderstanding the order in which they resolve and redirect to each other.

The System

Ubuntu 18.04
BASH version - GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)

The Problem

I am trying to redirect an output of a command into tee, have that redirect into ts (adding a timestamp) and then have that redirect into split (splitting the output into separate files). I can get the output to redirect into tee and ts but when redirecting into split I run into a problem.

My Attempts

command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]' > tempfile.txt)) - this will redirect the output into process substitution of tee then redirext to process substitution ts and add the timestamp then redirect to tempfile.txt this is what I would expect

command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]' >(split -d -b 10 -))) - this does nothing even though I would hope that the result would have been a bunch of 10 byte files with timestamps on the different rows.

To continue testing I tried with echo instead to see what happens command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]' >(echo))) - the print from the initial tee prints (as it should) but the echo prints an empty line apparently this irrelevant because of new result I got - see edit at the bottom

command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]') >(split -d -b 10 -)) - This prints the command with the timestamp (as tee and ts should) and in addition creates 10 byte files with the command output (no timestamp on them). - this is what I expected and makes sense as the tee gets redirected to both process substitutions separately, it was mostly a sanity check

What I think is happening

From what I can tell >(ts '[%Y-%m-%d %H:%M:%S]' >(split -d -b 10 -)) are resolving first as a complete and separate command of its own. Thus split (and echo) are receiving an empty output from ts which has no output on its own. Only after this does the actual command resolve and send its output to its substitution tee.

This doesn't explain why command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]' > tempfile.txt)) does work as by this theory tee by itself has no output so ts should be receiving not input and should also output a blank.

All this is to say I am not really sure what is happening.

What I want

Basically I just want to understand how to make command >(tee -a >(ts '[%Y-%m-%d %H:%M:%S]' >(split -d -b 10 -))) work in the way it seems it should. I need the commands output to send itself to the process substitution tee which will send it to the process substitution ts and add the timestamps which will sent it to split and split the output to several small files.

I have tried command > >(echo) and saw the output is blank, which is not what I expected (i expected echo to receive and then output the command output). I think I am just very much misunderstanding how process substitution works at this point.

Daniel Widdis
  • 8,424
  • 13
  • 41
  • 63
Oha Noch
  • 374
  • 1
  • 7
  • 22
  • 1
    Why not just `command | ts '[%Y-%m-%d %H:%M:%S]' | split -d -b 10 -`? – 644 Mar 10 '21 at 14:55
  • I didnt mention this but the actual command I am using splits stdout and stderr and I think this doesnt allow for piping (but maybe im wrong?). Also I would prefer (though this is more of a "nice to have") to use `tee` so that my output is also displayed on the screen at the same time. – Oha Noch Mar 10 '21 at 15:00
  • `echo` ignores its input. You are perhaps thinking of `cat`, which reads its input and writes everything to its output. – William Pursell Mar 10 '21 at 15:03
  • @WilliamPursell thank you. yes i noticed this trying to experiment with cat and seeing it work. I never realized this was the case though that echo ignores input when being piped to. – Oha Noch Mar 10 '21 at 15:07
  • If you want the error stream to also go into the pipe to `ts`, just do `command 2>&1 | ts ... | split ...` – William Pursell Mar 10 '21 at 15:08
  • You could also do `command |& ts ... |` as `|&` is shorthand for `2>&1 |` – 644 Mar 10 '21 at 15:11
  • so Ive got to `command | ts '[%Y-%m-%d %H:%M:%S]' | tee -a >(split -d -b 10 -)` which does everything except for the error stream. The thing is I would prefer to have the error stream separate. I used have it with `command > >(...split for output stream ...) 2> >(..split for error stream....)` but with pipes thats not possible. with @WilliamPursell recommendation of using `2>&1` they will both end up going to the same split which means they will end up in the same file. 1 file will also work, at this point i am being greedy trying to get it perfectly how i want – Oha Noch Mar 10 '21 at 15:14
  • @Nobody and William Pursell in any case your help provided me with a solution that is super helpful and is 99% what I wanted so thank you so much! If either one of you wants to put it as an answer I will happily accept it, otherwise I will answer myself. And if you happen to know how I can still split the streams to separate files that would still be very appreciated! – Oha Noch Mar 10 '21 at 15:30
  • @WilliamPursell tagging you so you don't miss the above comment that was also meant for you. It didnt allow me to tag multiple people – Oha Noch Mar 10 '21 at 15:35
  • Note that splitting out the error stream means you lose precise ordering -- `write()`s only _have_ a guaranteed order relative to each other when they go to the same file descriptor (or copies of the same file descriptor). As soon as you have two different kinds of handling between stdout and stderr, they can't possibly be copies of the same FD, so POSIX semantics no longer guarantees ordering preservation. – Charles Duffy Mar 10 '21 at 16:06
  • ...see a detailed discussion of the above at https://stackoverflow.com/questions/45760692/separately-redirecting-and-recombining-stderr-stdout-without-losing-ordering – Charles Duffy Mar 10 '21 at 16:08

3 Answers3

1

One thing you could do if you really want to have one command redirect stdin/stderr to a separate ts|tee|split is this

command 1> >(ts '[%Y-%m-%d %H:%M:%S]' | tee -a >(split -d -b 10 -)) 2> >(ts '[%Y-%m-%d %H:%M:%S]' | tee -a >(split -d -b 10 -))

But the downside is tee only prints after the prompt gets printed. There is probably a way to avoid this by duplicating file descriptors, but this is the best I could think of.

644
  • 629
  • 4
  • 14
  • ..."still expecting input"? That depends on what your `command` is; do we know that the command in use expects input at all? – Charles Duffy Mar 10 '21 at 16:08
  • 1
    I suspect what you're seeing is not something "expecting input", but instead, deferred writes reaching the console _after_ the prompt is printed, so the prompt is mixed in with the output -- thus, pressing enter just prints a second prompt. – Charles Duffy Mar 10 '21 at 16:09
  • @CharlesDuffy ah yep tee prints after the prompt gets printed. – 644 Mar 10 '21 at 16:11
  • ...with respect to how that can be avoided -- I've historically written code that uses `flock -s` to create a shared lock for each of your commands involved in doing a writing, and then using `flock -x` to wait for all those processes to exit (since an exclusive lock can only be created after all the shared locks are closed). – Charles Duffy Mar 10 '21 at 16:14
  • (btw, it's not really right to say "tee _only_ prints after the prompt" -- tee can and often will print earlier, depending on how much content there is and how long `command` takes to execute; it's just not guaranteed that it won't _also_ continue to print content `command` exited and the parent shell has detected that and printed the prompt). – Charles Duffy Mar 10 '21 at 16:15
  • Surely there's a cleaner way to do this without needing lockfiles? – 644 Mar 10 '21 at 16:21
  • If you have a new enough version of bash to assign `$!` to process substitutions, that'll let you `wait` for them; but the lockfile way is portable back to older versions. – Charles Duffy Mar 10 '21 at 16:27
1

You can split send the error stream from the command into a different pipeline than the output, if that is desired:

{ { cmd 2>&3 | ts ... | split; } 3>&1 >&4 | ts ... | split; } 4>&1

This sends the output of cmd to the first pipeline, while the error stream from cmd goes into the 2nd pipe. File descriptor 3 is introduced to keep the error streams from ts and split separate, but that may be undesirable. fd 4 is introduced to prevent the output of split from being consumed by the second pipeline, and that may be unnecessary (if split does not produce any output, for example.)

William Pursell
  • 204,365
  • 48
  • 270
  • 300
  • i am not 100% sure it works with the error stream cause i cant test that at the moment but otherwise seems to work like a charm! thanks! I did just add the tee option so that it also prints to the screen the output so i ended up with `{ { cmd 2>&3 | ts '[%Y-%m-%d %H:%M:%S]' | tee -a >(split -d -b 10 - out); } 3>&1 >&4 | ts '[%Y-%m-%d %H:%M:%S]' | tee -a >(split -d -b 10 - e rr); } 4>&1` – Oha Noch Mar 11 '21 at 09:50
0

This:

ts '[%Y-%m-%d %H:%M:%S]' >(split -d -b 10 -)

expands the file name generated by the process substitution on the command line of ts, so what gets run is something like ts '[%Y-%m-%d %H:%M:%S]' /dev/fd/63. ts then tries to open the fd that goes to split to read input from there, instead of reading from the original stdin.

That's probably not what you want, and on my machine, I got some copies of ts and split stuck in the background while testing. Possibly successfully connected to each other, which may explain the lack of error messages.

You probably meant to write

ts '[%Y-%m-%d %H:%M:%S]' > >(split -d -b 10 -)
                         ^

with a redirection to the process substitution.

That said, you could just use a pipe there between ts and split.

ilkkachu
  • 6,221
  • 16
  • 30