Why will Bash's read builtin not take input from the yes command via a pipe, but will work with process substitution?

Question

TL;DR

I'd like to understand why the yes command works properly with most tools and scripts that read from standard input, but fails to work with Bash's own read builtin except when using process substitution or a complex set of shell options. I find this behavior surprising and poorly-documented, although I think it's related to the way that Bash pipelines typically create subshells.

Bash's Read Builtin

I'm using Bash 5.1.16(1)-release on macOS 11.6.3. The yes command is therefore from BSD, but I'm seeing the same behavior on various Linux systems. Specifically, the output of yes can be successfully piped into shell scripts and tools that read from standard input, but for some reason I can't get it to populate a variable using the Bash read builtin. Since yes uses standard output, and read defaults to standard input, I'd expect the following to populate the builtin's default REPLY variable:

yes | read
echo "$REPLY"

However, REPLY isn't even set:

$ declare -p REPLY
bash: declare: REPLY: not found

Assuming the problem is the delimiter doesn't seem to help, and isn't borne out by the line-oriented tests in the code immediately below. If it were the lack of a newline, either of the following character-oriented options should work:

$ yes | read -n 1; declare -p REPLY
$ yes | read -N 1; declare -p REPLY

but again, in both cases Bash reports bash: declare: REPLY: not found.

Please note that the problem is the same even if I explicitly define a variable to populate. It isn't an issue with read's default REPLY variable; it seems to be an issue with the way that the builtin expects to get input.

Process Substitution, Some Complex Commands, and Non-Builtins Work Fine

On the other hand, Bash's process substitution works just fine:

$ read < <(yes)
$ echo "$REPLY"
y

Why would it work with process substitution, but not with a simple pipe? It also sort of works if I try to access REPLY from within a complex command. For example, after being sure to unset the REPLY variable with unset REPLY:

$ unset REPLY
$ yes | { read; echo "$REPLY"; }
y

$ declare -p REPLY
bash: declare: REPLY: not found

Obviously, it also works as expected with other tools that take standard input. For example, using Perl or Ruby:

$ yes | perl -ne 'print; exit'
y

$ yes | ruby -nle 'pp $_; exit'
"y"

Partial Answer from Related Question

Finally, based on a comment buried within a related question, it looks like you can make a standard(ish) pipeline work if you:

disable job control with the set builtin, and
enable the shell's lastpipe option with shopt.

For example:

$ shopt -s lastpipe; \
    set +m; \
    unset REPLY; \
    yes | read; \
    echo "$REPLY"
y

At least this defines the problem as a subshell-related issue rather than an issue with standard input, but it doesn't really explain why the limitation exists or what exactly job control has to do with this. If this is expected and foundational behavior for Bash, it's not really intuitive, and I'd appreciate a better explanation (if one exists) for the semantics of this.

dan · Answer 1 · 2022-02-03T03:16:19.097

Pipes create subshells. Subshells have their own variable scope, which ends when their command(s) (read in this case) finish executing.
For yes | read, read is writing variables (REPLY) to a subshell environment, which goes out of scope as soon read finishes executing.
Instead of a pipe, providing input to read (or any command) from redirecting (<) a file or process substitution, or from a here-string (<<< 'string input') are methods of providing input which allow read to run in the current execution environment (and variable scope). Meaning the variables it creates will persist.
yes | { read; echo "$REPLY"; } this works because read and echo are in the same command block, so they share the same subshell (and variable scope).
shopt -s lastpipe; yes | read; echo "$REPLY" works because the purpose of lastpipe is to make the last command of a pipeline execute in the current environment (not a subshell).
To reiterate, this has nothing to do with read specifically, and is entirely related to the fact that pipelines run in subshells. If you want to modify the current environment you need to do it outside of a pipeline. Bash at least provides process substituion to easily redirect command output.

Thank you for a good explanation. The only thing missing is why turning off job control is needed here. Specifically, lastpipe without `set +m` to disable job control still fails within read's environment context. — Todd A. Jacobs, Feb 03 '22 at 14:59

Why will Bash's read builtin not take input from the yes command via a pipe, but will work with process substitution?

TL;DR

Bash's Read Builtin

Process Substitution, Some Complex Commands, and Non-Builtins Work Fine

Partial Answer from Related Question

1 Answers1