Is there a way to groom the strings so the final sed output looks
like the input?
Here's a bash
demo script which reads strings from a temporary JSON
file into an indexed array and has GNU sed
write its own conversion
script to edit a template.
Note that \n
, \r
, \t
, \u
etc. in the JSON source will be converted
by jq -r
before bash
and sed
see them. The bash
script reads
newline-delimited lines and does not work for JSON strings containing \n
.
More comments below.
#!/bin/bash
jsonfile="$(mktemp)" templatefile="$(mktemp)"
# shellcheck disable=SC2064
trap "rm -f -- '${jsonfile}' '${templatefile}'" INT EXIT
cat << 'HERE' > "${jsonfile}"
{
"Name":"A1",
"Desc":"*A* \\1 /does/ 'Q&A' for you\tand \"other things\" \\@ $HOME !"
}
HERE
printf '%s\n' '---EVTITLE---' > "${templatefile}"
mapfile -t vars < <(
jq -r '.Name, .Desc' < "${jsonfile}"
)
wait "$!" || exit ## abort if jq failed
# shellcheck disable=SC2034
name="${vars[0]}" desc="${vars[1]}"
printf '%s\n' "${desc}" |
tee /dev/stderr |
sed -e 's/[\\/&\n]/\\&/g' -e 's/.*/s\/EVTITLE\/&\//' |
tee /dev/stderr |
sed -f /dev/stdin "${templatefile}"
These are the 3 lines output by the script (with tabs expanding to
different lengths) showing the contents of:
- the shell variable
desc
- the generated
sed
script
- the edited template file
*A* \1 /does/ 'Q&A' for you and "other things" \@ $HOME !
s/EVTITLE/*A* \\1 \/does\/ 'Q\&A' for you and "other things" \\@ $HOME !/
---*A* \1 /does/ 'Q&A' for you and "other things" \@ $HOME !---
bash
stores the string it reads and passes it on without modification
using printf
to sed
which in turn adds escapes as needed for a
replacement string to be inserted between s/EVTITLE/
and /
, i.e.
the sed
script required to edit the template file.
In the replacement section of a sed
s
ubstitute command the
following have a special meaning according to
POSIX
\
(backslash) the escape character itself
- the
s
command delimiter, default is /
but it may be anything
other than backslash and newline
&
(ampersand) referencing the entire matched portion
\
(
is one of digits 1 through 9 ) referencing a matched group
- a literal newline
but several sed
s recognize other escapes as replacements. For example,
GNU sed
will replace \f
, \n
, \t
, \v
etc. as in C, and (unless
--posix
option) its extensions \L
, \l
, \U
, \u
, and \E
act
on the replacement.
(More on this by info sed -n 'The "s" Command'
, info sed -n Escapes
,
info sed --index-search POSIXLY_CORRECT
.)
What this means is that all backslash, command delimiter, ampersand,
and newline characters in the input must be escaped, i.e. prefixed with
a backslash, if they are to represent themselves when used in a
replacement section. This is done by asking sed
to s/[\\/&\n]/\\&/g
.
Recall that most of the meta characters used in regular expressions
(and the shell, for that matter), such as ^$.*[]{}()
, have no special
meaning when appearing in the replacement section of sed
's s
command and so should not be escaped there. Contrariwise, &
is not
a regex meta character.
//g' ep_sed='s/\<\/p\>//g' amp_sed='s/\&/+/g;s/amp\;//g' dquote_sed='s/\"//g' squote_sed="s/\'//g" DESC=$(echo $Desc | sed -e "$p_sed;$ep_sed;$amp_sed;$dquote_sed;$squote_sed")`
– smartblonde Sep 17 '21 at 16:44