How do I get sed to substitute the first occurance of a complex string ending in a multi character delimiter?

Question

I have a long line of text and html tags. I wish to use sed to replace the characters between the string with the value 'MYSTART' up to the first occurance of </p> after the starting string. The substituting text is RESULTSAFTERSUBSTITUTIONWORKS

I've been fumbling and bumbling with regular expressions and keep hitting a brick wall. I also tried a few regex test sites but what they report as success doesn't work in sed for me either with or without using the '-r'.

cat myfile | sed -r 's/MYSTART.*?<\/p>/RESULTAFTERSUBSTITUTIONWORKS/'

My sample string looks something like this:

THISSHOULDBEIGNORED_MYSTART<ac>blah</ac><another>lots of things 123 abc :</another></p><div><ac>another thing</another><p>welcome home to somewhere</p></div>the line keeps going and going</p><p>paragraph</p>

After substitution it would look like this:

THISSHOULDBEIGNORED_RESULTAFTERSUBSTITUTIONWORKS<div><ac>another thing</another><p>welcome home to somewhere</p></div>the line keeps going and going</p><p>paragraph</p>

In case perl is an option: https://stackoverflow.com/questions/1103149/non-greedy-reluctant-regex-matching-in-sed/1103177#1103177 — jas, Jun 30 '19 at 19:45

Ed Morton · Accepted Answer · 2019-06-30T19:57:06.887

With any sed that recognizes \n as meaning <newline>:

$ sed 's:</p>:\n:; s/MYSTART.*\n/RESULTAFTERSUBSTITUTIONWORKS/' file
THISSHOULDBEIGNORED_RESULTAFTERSUBSTITUTIONWORKS<div><ac>another thing</another><p>welcome home to somewhere</p></div>the line keeps going and going</p><p>paragraph</p>

If you can have </p>s before your start string then it'd be more like this (untested):

sed 's:</p>:\n:g; s/MYSTART[^\n]*\n/RESULTAFTERSUBSTITUTIONWORKS/; s:\n:</p>:g'

How do I get sed to substitute the first occurance of a complex string ending in a multi character delimiter?

1 Answers1