Bash: how to print and run a cmd array which has the pipe operator, |, in it

Question

This is a follow-up to my question here: How to write bash function to print and run command when the command has arguments with spaces or things to be expanded

Suppose I have this function to print and run a command stored in an array:

# Print and run the cmd stored in the passed-in array
print_and_run() {
    echo "Running cmd:  $*"
    # run the command by calling all elements of the command array at once
    "$@"
}

This works fine:

cmd_array=(ls -a /)
print_and_run "${cmd_array[@]}"

But this does NOT work:

cmd_array=(ls -a / | grep "home")
print_and_run "${cmd_array[@]}"

Error: syntax error near unexpected token `|':

eRCaGuy_hello_world/bash$ ./print_and_run.sh 
./print_and_run.sh: line 55: syntax error near unexpected token `|'
./print_and_run.sh: line 55: `cmd_array=(ls -a / | grep "home")'

How can I get this concept to work with the pipe operator (|) in the command?

The problem is not in `print_and_run` but in `var=(foo | bar)` .. you'll need to escape the pipe character. — himdel, Feb 17 '22 at 00:16
@himdel, how do I escape it? If I do this: `cmd_array=(ls -a / \| grep "home")` then bash now thinks `|` is an *input* to the `ls` command, and I get: `ls: cannot access '|': No such file or directory` — Gabriel Staples, Feb 17 '22 at 00:35
To be clear, my strong advice here is "don't". When you need compound commands represented in a single command line with no syntax, that's a place for a shell function. For example, `grepping_for() { local pattern="$1"; "$@" | grep -e "$pattern"; }` allows `cmd_array=( grepping_for "home" ls -a / )` to be represented safely. Or, y'know, you can just use `set -x` with `PS4` to customize its output, and let the shell do the logging for you instead of implementing it by hand at all in the first place. — Charles Duffy, Feb 17 '22 at 00:51
@CharlesDuffy, thank you. I'm still trying to absorb everything. Would you mind demonstrating the `set -x` with custom `PS4` alternative too? The problem is, I'd like only *some* commands to be printed before they are run, not _all_ commands. Is that possible? — Gabriel Staples, Feb 17 '22 at 01:12
`set -x` is a little noisy -- you can turn it off with `set +x`, but the syntax needed to suppress the `set +x` command _itself_ being logged is a little uglier; so it's not perfect. That said, consider, at the top of your script: `PS4='$BASH_SOURCE:$LINENO+'`, and then later when you want to log a command: `set -x; ls -a / | grep "home"; { set +x; } 2>/dev/null` — Charles Duffy, Feb 17 '22 at 01:15
BTW, if you want to log the pipeline all as one line, one can use `set -v` instead of `set -x` (and similarly, `{ set +v; } 2>/dev/null` to silently disable); but that doesn't show you the values of substituted variables, so it's of limited utility in the same way that `eval` with a completely non-parameterized string is. — Charles Duffy, Feb 17 '22 at 01:17

Charles Duffy · Answer 1 · 2022-02-17T01:21:01.170

If you want to treat an array element containing only | as an instruction to generate a pipeline, you can do that. I don't recommend it -- it means you have security risk if you don't verify that variables into your string can't consist only of a single pipe character -- but it's possible.

Below, we create a random single-use "$pipe" sigil to make that attack harder. If you're unwilling to do that, change [[ $arg = "$pipe" ]] to [[ $arg = "|" ]].

# generate something random to make an attacker's job harder
pipe=$(uuidgen)

# use that randomly-generated sigil in place of | in our array
cmd_array=(
  ls -a /
  "$pipe" grep "home"
)

exec_array_pipe() {
  local arg cmd_q
  local -a cmd=( )
  while (( $# )); do
    arg=$1; shift
    if [[ $arg = "$pipe" ]]; then
      # log an eval-safe copy of what we're about to run
      printf -v cmd_q '%q ' "${cmd[@]}"
      echo "Starting pipeline component: $cmd_q" >&2
      # Recurse into a new copy of ourselves as a child process
      "${cmd[@]}" | exec_array_pipe "$@"
      return
    fi
    cmd+=( "$arg" )
  done
  printf -v cmd_q '%q ' "${cmd[@]}"
  echo "Starting pipeline component: $cmd_q" >&2
  "${cmd[@]}"
}

exec_array_pipe "${cmd_array[@]}"

See this running in an online sandbox at https://ideone.com/IWOTfO

Gabriel Staples · Answer 2 · 2022-02-17T00:45:15.943

1

Do this instead. It works.

print_and_run() {
    echo "Running cmd: $1"
    eval "$1"
}

Example usage:

cmd='ls -a / | grep -C 9999 --color=always "home"'
print_and_run "$cmd"

Output:

Running cmd: ls -a / | grep -C 9999 --color=always "home"
(rest of output here, with the word "home" highlighted in red)

edited Feb 17 '22 at 00:45

answered Feb 17 '22 at 00:43

Gabriel Staples

36,492
15
194
265

1

"Works" -- the [reasons not to use `eval`](https://mywiki.wooledge.org/BashFAQ/048) all apply. – Charles Duffy Feb 17 '22 at 00:44
1

...let's say you want to parameterize `home` to come from the user-specified, untrusted variable `pattern`. If one uses `cmd='... "$pattern"'`, then the log message doesn't tell you what was actually run so if it's of extremely limited value. If one uses `cmd='... '"$pattern"`, one is subject to shell injection vulnerabilities via hostile names. – Charles Duffy Feb 17 '22 at 00:45
1

...if one uses `cmd='...'"${pattern@Q}"`, one gets something that's secure but only compatible with bash 5.0 or later. If one uses `printf -v cmd '...%q' "$pattern"`, one gets safe and compatible code, but it's pretty darned wordy at that point. – Charles Duffy Feb 17 '22 at 00:47
@CharlesDuffy, how can I put a pipe in an array then? What do you propose as a solution to my question? Does no answer exist? – Gabriel Staples Feb 17 '22 at 00:49
1

What I propose as the answer depends on why you need it; approaches I've personally taken to solve this problem in the past range from trivial shell wrappers like the `grepping_for` example given in a comment on the question to a library of helpers intended to support `execline`-style programming in shell (for an intro, see https://skarnet.org/software/execline/) – Charles Duffy Feb 17 '22 at 00:53
1

There's no universal answer because _the whole point_ of using an array is to specify an exact argv with no ability for data to turn into syntax behind your back. So if we answered your question in the manner asked, we would either need to restrict the data that could be passed, accept those security risks, or otherwise pick a compromise to make _somewhere_. – Charles Duffy Feb 17 '22 at 00:55
BTW, another way I've done this in the past is with something like `with_pipes 3 ls -a / 2 grep "home"` -- prefixing each sub-array with the number of items within it. That way you don't need to compromise the range of possible values that can be passed, and you can use `${@:start:len}` to refer to only subsets of your command-line argument list, making for an efficient implementation. – Charles Duffy Feb 17 '22 at 01:22

KamilCuk · Answer 3 · 2022-02-17T08:47:19.367

The general direction is that you don't. You do not store the whole command line to be printed later, and this is not the direction you should take.

The "bad" solution is to use eval.

The "good" solution is to store the literal '|' character inside the array (or some better representation of it) and parse the array, extract the pipe parts and execute them. This is presented by Charles in the other amazing answer. It is just rewriting the parser that already exists in the shell. It requires significant work, and expanding it will require significant work.

The end result is, is that you are reimplementing parts of shell inside shell. Basically writing a shell interpreter in shell. At this point, you can just consider taking Bash sources and implementing a new shopt -o print_the_command_before_executing option in the sources, which might just be simpler.

However, I believe the end goal is to give users a way to see what is being executed. I would propose to approach it like .gitlab-ci.yml does with script: statements. If you want to invent your own language with "debug" support, do just that instead of half-measures. Consider the following YAML file:

- ls -a / | grep "home"
- echo other commands
- for i in "stuff"; do
      echo "$i";
  done
- |
  for i in "stuff"; do
      echo "$i"
  done

Then the following "runner":

import yaml
import shlex
import os
import sys

script = []
input = yaml.safe_load(open(sys.argv[1], "r"))
for line in input:
    script += [
        "echo + " + shlex.quote(line).replace("\n", "<newline>"),  # some unicode like ␤ would look nice
        line,
    ]
os.execvp("bash", ["bash", "-c", "\n".join(script)])

Executing the runner results in:

+ ls -a / | grep "home"
home
+ echo other commands
other commands
+ for i in "stuff"; do echo "$i"; done
stuff
+ for i in "stuff"; do<newline>    echo "$i"<newline>done<newline>
stuff

This offers greater flexibility and is rather simple, supports any shell construct with ease. You can try gitlab-ci/cd on their repository and read the docs.

The YAML format is only an example of the input format. Using special comments like # --- cut --- between parts and extracting each part with the parser will allow running shellcheck over the script. Instead of generating a script with echo statements, you could run Bash interactively, print the part to be executed and then "feed" the part to be executed to interactive Bash. This will alow to preserve $?.

Either way - with a "good" solution, you end up with a custom parser.

score 0 · Answer 4 · answered Feb 17 '22 at 08:30

Instead of passing an array, you can pass the whole function and use the output of declare -f with some custom parsing:

print_and_run() {
    echo "+ $(
        declare -f "$1" |
        # Remove `f() {` and `}`. Remove indentation.
        sed '1d;2d;$d;s/^ *//' |
        # Replace newlines with <newline>.
        sed -z 's/\n*$//;s/\n/<newline>/'
    )"
    "$@"
}

cmd() { ls -a / | grep "home"; }
print_and_run cmd

Results in:

+ ls --color -F -a / | grep "home"
home/

It will allow for supporting any shell construct and still allow you to check it with shellcheck and doesn't require that much work.

Bash: how to print and run a cmd array which has the pipe operator, |, in it

4 Answers4

Linked