I can't reproduce the problem using the code in the question (with the modified last line) in Bash version 3, 4, or 5 on Linux or Cygwin. In all tests the code produces a single line of output:
File1.swift
However, I get different output if I add this line at the start of the code:
trap '' SIGPIPE
Then the output becomes:
File1.swift
testprog: line 21: echo: write error: Broken pipe
testprog: line 21: echo: write error: Broken pipe
I think the most likely cause of your problem is that the shell running the program is ignoring the SIGPIPE
signal. The default action on receiving the SIGPIPE
signal is to exit immediately, and silently. That's why programs like ls
, and the unmodified Bash code, behave as they do.
Was the code in the question cut down from a larger body of code that includes trap '' SIGPIPE
(or similar, e.g. trap "" PIPE
) somewhere in it? If it was, you can make the problem go away by simply removing or disabling that line of code.
If you are seeing the problem with exactly the code in the question, then something non-obvious is causing the default SIGPIPE
handling to be disabled. One possibility is a configuration file that is being read due to an environment setting. You might be able to prevent that by changing the shebang line to
#!/bin/bash -p
The -p
option disables the use of some environment variables that can affect how Bash programs behave. I always use it to reduce the chance of surprises.
If that doesn't help, you could try explicitly restoring the default handling of SIGPIPE
by putting this at the start of the code:
trap - SIGPIPE
If that doesn't work, a possible cause of your problem is that your Bash has been built to ignore SIGPIPE signals by default. It seems unlikely that that is normal for Bash on macOS (I'd expect to see your question being asked often if it was) but I don't have access to a macOS system for testing. The easiest workaround would be to create a SIGPIPE
handler that explicitly exits:
trap 'exit 1' SIGPIPE
If the code is part of a program that needs to have SIGPIPE
ignored elsewhere, then you will need to change the implementation of the function. The simplest option may be to run the body of the function in a subshell and set default SIGPIPE
handling in that subshell:
function extractCommentParams {
(
trap - SIGPIPE
local key=$1
...
)
}
Note the use of parentheses (aka round brackets), which create a subshell, instead of braces (aka curly brackets). That will create an extra subprocess when the program is run, but it shouldn't make it noticeably slower given that the code runs a subprocess (sed
) per line of input.
Another option is to use sed
to do all of the work, not run it line-by-line:
#!/bin/bash -p
# title = My Test Script
# include = File1.swift
# include = File2.swift
# include = Source File 3.swift
function extractCommentParams
{
local -r key=$1
local keyEscaped
keyEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$key")
local -r s='[[:space:]]'
local -r S='[^[:space:]]'
local -r soughtRx="^$s*#$s*$keyEscaped$s*[=:]$s*\\(.*$S\\)$s*\$"
sed -n "s/$soughtRx/\\1/ip"
}
cat "$0" | extractCommentParams "include" | head -n 1
- This works regardless of the
SIGPIPE
handling in the shell because the SIGPIPE
signal is seen only by sed
, not the shell.
- See Is it possible to escape regex metacharacters reliably with sed for an explanation of
keyEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$key")
.
- For maximum portability (/robustness?) I used only POSIX sed features (so no
-E
option and no \s
or \S
patterns). I used variables s
and S
in an attempt to make the regular expression more readable. I suspect that some people wouldn't actually find it more readable.
In most cases, it's better to use sed
, or other specialist tools, instead of shell loops to process text. See Why is using a shell loop to process text considered bad practice? for more information on this topic. If, for reasons not obvious in the question, you are sure that you really need to use a shell loop for what you are doing, then you can make it much faster by using the built-in regular expression support in Bash rather than running sed
on each line. Here is one way to do it:
function extractCommentParams
{
local -r key=$1
local -r preKeyRx="^[[:space:]]*#[[:space:]]*"
local -r postKeyRx="[[:space:]]*[=:][[:space:]]*(.*[^[:space:]])[[:space:]]*\$"
local line
while IFS= read -r line || [[ -n $line ]]; do
if [[ $line =~ $preKeyRx"$key"$postKeyRx ]]; then
printf '%s\n' "${BASH_REMATCH[1]}"
fi
done
}
It appears from information in the comments that the SIGPIPE
signal may not be disabled but is not being handled properly. I guess there may be a bug in the version of Bash being used. It also appears from the comments that SIGPIPE
signals are being handled correctly in programs called from the shell (e.g. ls
). If that is the case, one workaround for the problem is to ensure that all output is done by external programs. The sed
-only solution above should just work. The line-by-line solution can be modified to use the printf
program instead of the printf
builtin. One way to do it is:
function extractCommentParams
{
local -r key=$1
local -r preKeyRx="^[[:space:]]*#[[:space:]]*"
local -r postKeyRx="[[:space:]]*[=:][[:space:]]*(.*[^[:space:]])[[:space:]]*\$"
local outputs=()
local line
while IFS= read -r line || [[ -n $line ]]; do
if [[ $line =~ $preKeyRx"$key"$postKeyRx ]]; then
outputs+=( "${BASH_REMATCH[1]}" )
fi
done
if (( ${#outputs[*]} > 0 )); then
env printf '%s\n' "${outputs[@]}"
fi
}
- To minimize performance problems due to running a subprocess for every line of output, collect the output lines in an array (
outputs
) and print them all together when the input has been fully processed.
- Using
env printf ...
causes the printf
command to be found on the PATH
instead of using the builtin.
Finally, if you really need to use a loop and printf
to produce the output, and you really need to have SIGPIPE
ignored, then you will need to handle output errors explicitly.
Unconditionally "swallowing STDERR
" would not be a good idea because output can fail for reasons other than a broken pipe (filesystem full, file reached a size limit, network file became inaccessible, ...). This is a (somewhat rough-and-ready) attempt at an acceptable way to to it:
function extractCommentParams
{
local -r key=$1
local -r preKeyRx="^[[:space:]]*#[[:space:]]*"
local -r postKeyRx="[[:space:]]*[=:][[:space:]]*(.*[^[:space:]])[[:space:]]*\$"
exec 3>&1
local line
local -x LC_ALL=C
local printf_stderr
while IFS= read -r line || [[ -n $line ]]; do
if [[ $line =~ $preKeyRx"$key"$postKeyRx ]]; then
if ! printf_stderr=$(printf '%s\n' "${BASH_REMATCH[1]}" 2>&1 1>&3); then
[[ $printf_stderr == *'Broken pipe' ]] \
|| printf '%s\n' "$printf_stderr" >&2
exec 3>&-
return 1
fi
fi
done
exec 3>&-
}
exec 3>&1
associates file descriptor 3 with the standard output of the function, so it can be accessed from the command substitution below. This is dangerous because it may clash with use of file descriptor 3 elsewhere in the real code. Bash 4 has a mechanism for ensuring that file descriptors don't clash, but I assume this code has to work with the standard Bash (version 3) on macOS.
exec 3>&-
later in the code closes file descriptor 3.
local -x LC_ALL=C
sets the locale in the function, and everything called from it, to the C/POSIX locale. This is to try to ensure that error messages will be consistent (ASCII text, English) on all systems so pattern matching against them has some chance of working.
printf ... 2>&1 1>&3
redirects the standard error of printf
so it is captured by $(...)
and redirects the standard output to the standard output of the function.
- The following code prints the error message output by
printf
unless it refers to a broken pipe.
- The
return 1
is there because functions should always return non-zero values in case of error. Non-zero exit status can make a difference even within a pipeline because set -o pipefail
may be in effect and cause the pipeline status to be taken from the first stage to exit with non-zero status. Even if pipefail
is not used the exit statuses of pipeline stages can be inspected in the PIPESTATUS builtin array.