"sed" special characters handling

Question

we have an sed command in our script to replace the file content with values from variables

for example..

export value="dba01upc\Fusion_test"
sed -i "s%{"sara_ftp_username"}%$value%g" /home_ldap/user1/placeholder/Sara.xml

the sed command ignores the special characters like '\' and replacing with string "dba01upcFusion_test" without '\' It works If I do the export like export value='dba01upc\Fusion_test' (with '\' surrounded with ‘’).. but unfortunately our client want to export the original text dba01upc\Fusion_test with single/double quotes and he don’t want to add any extra characters to the text. Can any one let me know how to make sed to place the text with special characters..

Before Replacement : Sara.xml

<?xml version="1.0" encoding="UTF-8"?>
<ser:service-account >
<ser:description/>
<ser:static-account>
<con:username>{sara_ftp_username}</con:username>
</ser:static-account>
</ser:service-account>

After Replacement : Sara.xml

<?xml version="1.0" encoding="UTF-8"?>
<ser:service-account>
<ser:description/>
<ser:static-account>
<con:username>dba01upcFusion_test</con:username>
</ser:static-account>
</ser:service-account>

Thanks in advance

FYI, BashFAQ #21 contains an awk script usable for reliable replacements; see http://mywiki.wooledge.org/BashFAQ/021 — Charles Duffy, Jul 23 '14 at 17:57
There's no need to export variables that don't need to be accessed from subprocesses... and for variables holding passwords/credentials, exporting them is just plain a bad idea (it's fixed on most very new UNIXlikes, but some older systems made environment variables publicly visible in the same way that command lines are). — Charles Duffy, Jul 23 '14 at 18:04
wrt the referenced awk script, you don't need to escape the backslashes if you pass the strings in in the arg list instead of populating variables from them. I'm not convinced it's safe to blindly escape every backslash anyway, but I could be wrong - some day when I have nothing else to do I'll actually try to see if I can come up with a counter-example. — Ed Morton, Jul 23 '14 at 19:24
@CharlesDuffy: In your linked `awk` script, aside from needing to escape backslashes, there's also the issue of actual newlines in the input string(s), which break the command when using BSD `awk` (and, from what I understand, this behavior may be POSIX-compliant). Using the pass-the-strings-as-pseudo-filenames technique in [@Ed's answer](http://stackoverflow.com/a/24898517/45375) avoids that problem, while also obviating the need to escape. — mklement0, Jul 23 '14 at 21:03

Ed Morton · Answer 1 · 2014-07-24T00:01:38.577

3

You cannot robustly solve this problem with sed. Just use awk instead:

awk -v old="string1" -v new="string2" '
idx = index($0,old) {
    $0 = substr($0,1,idx-1) new substr($0,idx+length(old))
}
1' file

Ah, @mklement0 has a good point - to stop escapes from being interpreted you need to pass in the values in the arg list along with the file names and then assign the variables from that, rather than assigning values to the variables with -v (see the summary I wrote a LONG time ago for the comp.unix.shell FAQ at http://cfajohnson.com/shell/cus-faq-2.html#Q24 but apparently had forgotten!).

The following will robustly make the desired substitution (a\ta -> e\tf) on every search string found on every line:

$ cat tst.awk
BEGIN {
    old=ARGV[1]; delete ARGV[1]
    new=ARGV[2]; delete ARGV[2]
    lgthOld = length(old)
}
{
    head = ""; tail = $0
    while ( idx = index(tail,old) ) {
        head = head substr(tail,1,idx-1) new
        tail = substr(tail,idx+lgthOld)
    }
    print head tail
}

$ cat file
a\ta    a       a       a\ta

$ awk -f tst.awk 'a\ta' 'e\tf' file
e\tf    a       a       e\tf

The white space in file is tabs. You can shift ARGV[3] down and adjust ARGC if you like but it's not necessary in most cases.

edited Jul 24 '14 at 00:01

answered Jul 22 '14 at 21:52

Ed Morton

188,023
17
78
185

1

While this is more robust than @anubhava's `printf "%q"` solution, it still breaks with strings such as `'\1'`, because of how `awk` parses escape sequences in string literals. You'd have to "pre-escape" `\ ` instances to fix that. – mklement0 Jul 23 '14 at 15:44
1

Good point, but the solution is to move the assignment, not escape them (you can't just escape backslashes as they may already be escaping something). See my updated solution. – Ed Morton Jul 23 '14 at 17:03
+1 for the input-via-pseudo-filenames revision, because it doesn't require pre-escaping the input. However, if the input strings are truly to be taken as _literals_, I don't think that blindly `\ `-escaping `\ ` instances is a problem (after the escaping `\ ` instances are "eaten", you should end up with the original string). Haven't found a problem with my ``sed`` solution yet, but do let me know if you know how to break it. – mklement0 Jul 23 '14 at 19:22
1

I'm really not sure if blindly escaping `\ `s has issues or not. Some day I'll think about it more and see if I can convince myself it's OK or prove it's not. Your sed solution didn't behave as desired for me when there was a literal newline in the replacement string - I got `sed: -e expression #1, char 30: unterminated `s' command`. In general I think escpaing chars in the replacement string is MUCH less problematic than in the search string where which chars should be and which must not be escaped is a larger set and dependent on your sed version and options (eg BRE vs ERE). – Ed Morton Jul 23 '14 at 19:30
Ah, good point re newlines: fixed, thanks. (As an aside, your switch to the input-via-pseudo-filenames approach also avoided the same problem with BSD `awk` (GNU `awk` and `mawk` are fine with unescaped actual newlines in literals - possibly in contradiction to POSIX)). – mklement0 Jul 23 '14 at 19:46
2

I think the main problem I'm having with the idea of escaping RE meta-characters (and with sed also delimiters and back-references) to try to make the tool treat the given string as literal is just that it's conceptually the wrong approach. It's like wanting a glass of water and going to the freezer for ice cubes then microwaving them until they thaw instead of simply pouring a glass of water. We want to operate on strings, so lets just pass in strings and use string functions/operations. All this stuff about trying to turn strings into "safe" REs so we can use RE functions/operations is ugly. – Ed Morton Jul 23 '14 at 19:46
1

Good point, especially re escaping the _search_ string. The pre-escaping hoops my `sed` solution has to jump through further support your point, and it only covers the _replacement_ string. Your revised solution is now indeed the most robust and generic - and simpler to boot. – mklement0 Jul 23 '14 at 20:10
1

Thanks. I just noticed the OP had a `g` at the end of his sed command so I modified my answer to replace every occurrence of the search string instead of just the first on each line. – Ed Morton Jul 23 '14 at 20:29

mklement0 · Answer 2 · 2018-01-31T07:08:07.987

Update with the benefit of hindsight, to present options:

Update 2: If you're intent on using sed, see the - somewhat cumbersome, but now robust and generic - solution below.
If you want a robust, self-contained awk solution that also properly handles both arbitrary search and replacement strings (but cannot incorporate regex features such as word-boundary assertions), see Ed Morton's answer.
If you want a pure bash solution and your input files are small and preserving multiple trailing newlines is not important, see Charles Duffy's answer.
If you want a full-fledged third-party templating solution, consider, for instance, j2cli, a templating CLI for Jinja2 - if you have Python and pip, install with sudo pip install j2cli.
Simple example (note that since the replacement string is provided via a file, this may not be appropriate for sensitive data; note the double braces ({{...}})):
```
value='dba01upc\Fusion_test'
echo "sara_ftp_username=$value" >data.env
echo '<con:username>{{sara_ftp_username}}</con:username>' >tmpl.xml
j2 tmpl.xml data.env # -> <con:username>dba01upc\Fusion_test</con:username>
```

If you use sed, careful escaping of both the search and the replacement string is required, because:

As Ed Morton points out in a comment elsewhere, sed doesn't support use of literal strings as replacement strings - it invariably interprets special characters/sequences in the replacement string.
Similarly, the search string literal must be escaped in a way that its characters aren't mistaken for special regular-expression characters.

The following uses two generic helper functions that perform this escaping (quoting) that apply techniques explained at "Is it possible to escape regex characters reliably with sed?":

#!/usr/bin/env bash

# SYNOPSIS
#   quoteRe <text>
# DESCRIPTION
#   Quotes (escapes) the specified literal text for use in a regular expression,
#   whether basic or extended - should work with all common flavors.
quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }

# '

# SYNOPSIS
#  quoteSubst <text>
# DESCRIPTION
#  Quotes (escapes) the specified literal string for safe use as the substitution string (the 'new' in `s/old/new/`).
quoteSubst() {
  IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1")
  printf %s "${REPLY%$'\n'}"    
}

# The search string.
search='{sara_ftp_username}'

# The replacement string; a demo value with characters that need escaping.
value='&\1%"'\'';<>/|dba01upc\Fusion_test'

# Use the appropriately escaped versions of both strings.
sed "s/$(quoteRe "$search")/$(quoteSubst "$value")/g" <<<'<el>{sara_ftp_username}</el>'

# -> <el>&\1%"';<>/|dba01upc\Fusion_test</el>

Both quoteRe() and quoteSubst() correctly handle multi-line strings.
- Note, however, given that sed reads a single line at at time by default, use of quoteRe() with multi-line strings only makes sense in sed commands that explicitly read multiple (or all) lines at once.
quoteRe() is always safe to use with a command substitution ($(...)), because it always returns a single-line string (newlines in the input are encoded as '\n').
By contrast, if you use quoteSubst() with a string that has trailing newlines, you mustn't use $(...), because the latter will remove the last trailing newline and therefore break the encoding (since quoteSubst() \-escapes actual newlines, the string returned would end in a dangling \).
- Thus, for strings with trailing newlines, use IFS= read -d '' -r escapedValue < <(quoteSubst "$value") to read the escaped value into a separate variable first, then use that variable in the sed command.

+1 Much more robust solution than mine (deleted now) – anubhava Jul 23 '14 at 15:59 — anubhava, Jul 23 '14 at 15:59

Charles Duffy · Answer 3 · 2014-07-23T19:16:11.200

1

This can be done with bash builtins alone -- no sed, no awk, etc.

orig='{sara_ftp_username}'               # put the original value into a variable
new='dba01upc\Fusion_test'               # ...no need to 'export'!

contents=$(<Sara.xml)                    # read the file's content into
new_contents=${contents//"$orig"/$new}   # use parameter expansion to replace
printf '%s' "$new_contents" >Sara.xml    # write new content to disk

See the relevant part of BashFAQ #100 for information on using parameter expansion for string substitution.

edited Jul 23 '14 at 19:16

answered Jul 23 '14 at 17:59

Charles Duffy

280,126
43
390
441

2

True, but the general caveat is that it's _only suitable for smaller files_, given that the entire input file is read _at once_. Also, if you want the contents of `$orig` to be treated as a _literal_, you must double-quote it: `new_contents=${contents//"$orig"/$new}` (whereas you _can_, but _needn't_ double-quote `$new`). – mklement0 Jul 23 '14 at 19:12
1

@mklement0, thanks -- I learned something there; wasn't previously aware that `$orig` would be treated as a pattern in that case. – Charles Duffy Jul 23 '14 at 19:16
1

Note that this approach will remove any blank lines from the end of the file, which might not be desirable. – Ed Morton Jul 23 '14 at 19:38
1

@EdMorton: Good point; and, to clarify, since "blank" could be interpreted to include _all-whitespace_ lines, we're talking about trailing _empty_ lines (a run of one or more newline characters at the very end of the file). With that caveat noted, it's probably better to output the modified string with a trailing newline: `printf '%s\n' ...` – mklement0 Jul 23 '14 at 19:54
@CharlesDuffy: Glad to hear it; +1 for a solution that is handy, if one knows the constraints. – mklement0 Jul 23 '14 at 19:57
@mklement0 right, unless, of course, the original file didn't HAVE a terminating newline and then we're once again doing something potentially undesirable to it :-). – Ed Morton Jul 23 '14 at 20:03
@EdMorton: Point taken, but given that the only two options here are to _never_ or _always_ add a trailing newline, I'd opt for _always_, as that is more typical, with likely fewer surprises down the road. – mklement0 Jul 23 '14 at 20:53
1

@mklement0 agreed, my awk script would add a newline if one wasn't present but then in gawk at least you always have the option of adding it or not by setting `ORS=RT` if that's really an issue. In other awks, you'd need to get creative! – Ed Morton Jul 23 '14 at 21:15

"sed" special characters handling

3 Answers3

Linked

Related