1

I want to do the following,

         command2(stdout)
        /                \
command1                  command4
        \                /
         command3(stderr)

As it is covered in How can I split and re-join STDOUT from multiple processes?

Except, command1 outputs different text to both, stdout and stderr. So, it is a combination of the above question and Pipe stderr to another command

To have a context, what I am trying to achieve:

  1. Execute curl
  2. Capture raw output (stdout), base64, and embed it into json: curl https://someaddress.tld | base64 | jq --raw-input '{"curl_ret" : .}'
  3. Output curl json (return code etc) and pass it to stderr: curl --write-out '%{stderr}%{json}' https://someaddress.tld
  4. Given #2 and #3 is the same curl call, I want to merge the output of #2 and #3 and pass the merged result to jq: jq --slurp ...

All these in one piped command.

stdout and stderr separation is done to avoid parsing merged text, to avoid pitfalls, given curl output can be anything. Curl has --silent switch so no unexpected text in either output stream

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
Vetal
  • 275
  • 1
  • 3
  • 13
  • In the general case, ignoring the fact that the command is curl: The tricky thing about this is retaining the original line ordering. Everything else is easy. Keeping the order intact is impossible if you only have standard POSIX semantics. – Charles Duffy Feb 02 '23 at 03:14
  • Anyhow, do you really need it to be a pipe at all, vs capturing content in memory and then running the latter command after the first is done? (Remember, all parts of a pipeline run in parallel) – Charles Duffy Feb 02 '23 at 03:18
  • Ideally - single command, since it all runs via Terraform, `data "external"` So, it can be a multi-command "one-liner", `command 1 ; command 2`. And no stray files since security requirements are tight – Vetal Feb 02 '23 at 03:22
  • 1
    How about named pipes? They don't contain any data so they don't pose the same risk as real files (the data traversing a named pipe is all in kernel memory buffers, never on the filesystem) – Charles Duffy Feb 02 '23 at 03:23
  • Alternately, can we assume bash 4.0 or newer? (If you need `/bin/sh` compatibility, the answer to this is "no") – Charles Duffy Feb 02 '23 at 03:26
  • Thank you for the hint! I will read about named pipes. I feel like I read about these back in university on Unix course, along with forks. (1995). Now it start making sense :) For bash, it is safe to assume bash 3.2 and zsh, what devs have on Mac – Vetal Feb 02 '23 at 03:30
  • 1
    I'd _hope_ a professional developer on a Mac would use Nix, or MacPorts, or Homebrew (listed in my personal deeply-opinionated order of preference) to give them access to a bash newer than the more-than-a-decade-out-of-date thing Apple ships. :) – Charles Duffy Feb 02 '23 at 04:00
  • _Really_, though, I'd drop the `%{stderr}` from `--write-out` and just expect whatever the last line of content is on stdout to be your `%{json}`. That ordering is entirely well-defined: curl doesn't _know_ the content of that variable until the body is completely 100% written. – Charles Duffy Feb 02 '23 at 04:39

1 Answers1

1

In practice, for the use case at hand

You don't need to do this at all. --write-out '%{json}' will always be written after the body content, so it's always the last line of stdout. It's safe to have it in the same stream.

getExampleAndMetadata() {
  curl --write-out '%{json}' https://example.com |
    jq -RSs '
      split("\n")
      | {"body": .[:-1] | join("\n"),
         "logs": .[-1] | fromjson}'
}

As an exercise

It's ugly, and I don't recommend it -- it'd be better to just parse the two sets of data out of stdout and leave stderr free so you have a way to log actual errors -- but, doing what you asked for:

getExampleAndMetadata() {
  local tmpdir retval=0
  # mkfifo is different between operating systems, adjust to fit
  tmpdir=$(mktemp -d sa_query.XXXXXX) || return
  mkfifo "$tmpdir/stdout" "$tmpdir/stderr" || { retval=$?; rm -rf "$tmpdir"; return "$retval"; }
  curl --silent --write-out '%{stderr}%{json}' https://example.com \
    >"$tmpdir/stdout" 2>"$tmpdir/stderr" & curl_pid=$!
  # must read stdout _before_ stderr to avoid a deadlock here
  # to stop caring what order of operations jq uses, we use process substitutions
  # to buffer both stdout and stderr in-memory
  jq -Rn \
    --rawfile content <(out=$(cat "$tmpdir/stdout"); printf '%s\n' "$out") \
    --slurpfile logs <(err=$(cat "$tmpdir/stderr"); printf '%s\n' "$err") \
    '{"body": $content, "logs": $logs}'; (( retval |= $? ))
  rm -rf -- "$tmpdir"; (( retval |= $? ))
  wait "$curl_pid"; (( retval |= $? ))
  return "$retval"
}

...gives you a simple command, getExampleAndMetadata. And of course, if you eliminate the comments and line continuations you can collapse the whole thing to one line by adding ;s appropriately.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • @peak, if you're around, I'm curious if the semantics around the order jq reads its input in are well-defined. If we could make it slurp one input in entirety before trying to read from the other at all, it would allow the process substitution hackery to be avoided. – Charles Duffy Feb 02 '23 at 04:37
  • Indeed, very concise and readable answer. It looks like my original idea is overkill. Besides, lockup explains why I had lockups when doing piping experimenting. Very nice, both, practical and educational answer, thanks a lot! – Vetal Feb 02 '23 at 05:24
  • 1
    BTW, to be clear, stdout-before-stderr is specific to stdout and stderr as they're written by curl when called in this manner specifically; different programs will have different usage patterns. Usual best practice is to read from both in parallel so the order of writes doesn't matter. – Charles Duffy Feb 02 '23 at 05:41