1

I have a long line of text and html tags. I wish to use sed to replace the characters between the string with the value 'MYSTART' up to the first occurance of </p> after the starting string. The substituting text is RESULTSAFTERSUBSTITUTIONWORKS

I've been fumbling and bumbling with regular expressions and keep hitting a brick wall. I also tried a few regex test sites but what they report as success doesn't work in sed for me either with or without using the '-r'.

cat myfile | sed -r 's/MYSTART.*?<\/p>/RESULTAFTERSUBSTITUTIONWORKS/'

My sample string looks something like this:

THISSHOULDBEIGNORED_MYSTART<ac>blah</ac><another>lots of things 123 abc :</another></p><div><ac>another thing</another><p>welcome home to somewhere</p></div>the line keeps going and going</p><p>paragraph</p>

After substitution it would look like this:

THISSHOULDBEIGNORED_RESULTAFTERSUBSTITUTIONWORKS<div><ac>another thing</another><p>welcome home to somewhere</p></div>the line keeps going and going</p><p>paragraph</p>

Rick
  • 103
  • 7
  • In case perl is an option: https://stackoverflow.com/questions/1103149/non-greedy-reluctant-regex-matching-in-sed/1103177#1103177 – jas Jun 30 '19 at 19:45

1 Answers1

0

With any sed that recognizes \n as meaning <newline>:

$ sed 's:</p>:\n:; s/MYSTART.*\n/RESULTAFTERSUBSTITUTIONWORKS/' file
THISSHOULDBEIGNORED_RESULTAFTERSUBSTITUTIONWORKS<div><ac>another thing</another><p>welcome home to somewhere</p></div>the line keeps going and going</p><p>paragraph</p>

If you can have </p>s before your start string then it'd be more like this (untested):

sed 's:</p>:\n:g; s/MYSTART[^\n]*\n/RESULTAFTERSUBSTITUTIONWORKS/; s:\n:</p>:g'
Ed Morton
  • 188,023
  • 17
  • 78
  • 185