1

I have replaced urls in the past using sed with no problem before. However, this url imparticular is giving me trouble. It has quite a few ampersands and I need to replace them. How would I go about doing that?

sed -i.bak "s#<string>https://www.url1toreplace.com?blah=1234585474738743874386328764287364238746283764287346872364fN&amp;blah=Y&amp;blah=%2Fwebapp%2Fwcs%2Fblahblah%2Fblah%2Fen%2Fblahahah%3Fblah%3e212e123152%26cm_mmc%3DBLAH-_-BLAH-_-Null-_-Null</string>#<string>https://www.urltoreplace.com/blah/blah/blah/blah/en/blah?blah=129i312093132&cm_mmc=BLAH-_-BLAH-_-Null-_-Null</string>#g" path/to/xml/file

My problem is that it's not fully replacing the url. How do I escape the ampersands so I can successfully replace www.url1toreplace.com with www.urltoreplace.com and everything that follows?

user3124081
  • 412
  • 1
  • 4
  • 16
  • not sure I understand what you mean by `replace ... **everything that follows**`; at a minimum could you provide an example of actual data, both the before and after values – markp-fuso Sep 07 '17 at 21:08
  • Hi. Sorry if that was not clear. I'm looking to replace https://www.url1toreplace.com?blah=1234585474738743874386328764287364238746283764287346872364fN&blah=Y&blah=%2Fwebapp%2Fwcs%2Fblahblah%2Fblah%2Fen%2Fblahahah%3Fblah%3e212e123152%26cm_mmc%3DBLAH-_-BLAH-_-Null-_-Null with https://www.urltoreplace.com/blah/blah/blah/blah/en/blah?blah=129i312093132&cm_mmc=BLAH-_-BLAH-_-Null-_-Null – user3124081 Sep 07 '17 at 21:17
  • I guess try running the command I provided in the question. Does it work for you? – user3124081 Sep 07 '17 at 21:20
  • 1) don't have the xmlfile you're querying; 2) can't tell if the excessive `blah`'s are part of the actual data or an attempt to mask the real data; at this point I'm assuming you want to replace **everything** ... URL and all parameters ... between `https://www.` and `` – markp-fuso Sep 07 '17 at 21:26

2 Answers2

3

In the replacement text, you need to escape &.

For example, without the escape, the whole of the original match is substituted in for each &:

$ echo '&amp;' | sed 's#&amp;#a & b & c#'
a &amp; b &amp; c

With the escape, \&, & is treated as an ordinary character:

$ echo '&amp;' | sed 's#&amp;#a \& b \& c#'
a & b & c

Your example

Let's take this test file:

$ cat file
<string>https://www.url1toreplace.com?blah=1234585474738743874386328764287364238746283764287346872364fN&amp;blah=Y&amp;blah=%2Fwebapp%2Fwcs%2Fblahblah%2Fblah%2Fen%2Fblahahah%3Fblah%3e212e123152%26cm_mmc%3DBLAH-_-BLAH-_-Null-_-Null</string>

And run the original command:

$ sed "s#<string>https://www.url1toreplace.com?blah=1234585474738743874386328764287364238746283764287346872364fN&amp;blah=Y&amp;blah=%2Fwebapp%2Fwcs%2Fblahblah%2Fblah%2Fen%2Fblahahah%3Fblah%3e212e123152%26cm_mmc%3DBLAH-_-BLAH-_-Null-_-Null</string>#<string>https://www.urltoreplace.com/blah/blah/blah/blah/en/blah?blah=129i312093132\&cm_mmc=BLAH-_-BLAH-_-Null-_-Null</string>#g" file
<string>https://www.urltoreplace.com/blah/blah/blah/blah/en/blah?blah=129i312093132&cm_mmc=BLAH-_-BLAH-_-Null-_-Null</string>

The above command fails. If we escape the &, however, we get:

$ sed 's#<string>https://www.url1toreplace.com?blah=1234585474738743874386328764287364238746283764287346872364fN&amp;blah=Y&amp;blah=%2Fwebapp%2Fwcs%2Fblahblah%2Fblah%2Fen%2Fblahahah%3Fblah%3e212e123152%26cm_mmc%3DBLAH-_-BLAH-_-Null-_-Null</string>#<string>https://www.urltoreplace.com/blah/blah/blah/blah/en/blah?blah=129i312093132\&cm_mmc=BLAH-_-BLAH-_-Null-_-Null</string>#g' file
<string>https://www.urltoreplace.com/blah/blah/blah/blah/en/blah?blah=129i312093132&cm_mmc=BLAH-_-BLAH-_-Null-_-Null</string>

This succeeds: the & in the replacement string successfully appears in the output.

John1024
  • 109,961
  • 14
  • 137
  • 171
1

Sample data file:

$ cat xfile
<string>https://www.old.home.com?x=123&amp;y=abc&amp;z=ABC_mmc%3D</string>

Desired output:

<string>https://www.new.home.biz?A=XYZ&amp;B=123&amp;C=987_jjj%2XD</string>

As John1024's already pointed out, if a sed replacement string contains &'s, the &'s have to be escaped (\&) (because & has a special meaning to sed).

Hmmmm, but that could be a major pain in the keister if ya gotta go through and (manually?) change all sed replacement patterns from & to \&. But this replacement can be automated with a few minor assumptions ...

Assumptions:

  • search and replace patterns can be stored in variables before and after, respectively (actually, only the after variable is needed for this idea to work, but for this example I'll use before and after variables)
  • before and after contain normal strings w/out any special escapes
  • your version of bash supports character replacement via the ${var// /} construct

Apply escapes to the after variable on the fly:

$ before='old.home.com?x=123&amp;y=abc&amp;z=ABC_mmc%3D'

$ after='new.home.biz?A=XYZ&amp;B=123&amp;C=987_jjj%2XD'

$ sed "s#${before}#${after//\&/\\\&}#g" xfile

<string>https://www.new.home.biz?A=XYZ&amp;B=123&amp;C=987_jjj%2XD</string>
  • ${after//\&/\\\&} : in the after variable, replace all occurrences of & with \&

This eliminates the need to go through and manually escape all occurrences of & in the replacement string.

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • Hmmm.. you're right. It would be a pain in the keister doing that manually! I thought about looping individual characters in the sting and replacing it that way.. but that's resource intensive. My examples have backslash inside them `/` how would `${var// /}` construct you suggested work with that? – user3124081 Sep 08 '17 at 15:31
  • The `//` and `/` are predefined delimiters so all you would need to do is escape your search/replace patterns (as needed), eg: `x='a/c' ; echo ${x//\//X} => aXc ; x='a\c' ; echo ${x//\\/X} => aXc` – markp-fuso Sep 08 '17 at 15:48