There is some data (xml) in a file, and I need to remove text (not the whole line, so /d option of sed does not suit) from Substring1 up to Substring2 (including both) only if contains a pattern. My problem here is that there could be various formatting, so Substring1 and Substring2 can be either on the same line or on different, or there could be several pairs of Substrin1/2 on the same line.
Example (1st line - 2 pairs of Substrings1/2 and first one contains PATTERN, 2nd line - 1 pair with PATTERN, 3rd line - 1 pair without PATTERN, 4th and 5th lines - 1 pair with PATTERN, 6th and 7th lines - 1 pair without PATTERN):
Substring1 = <?xml
Substring2 = </update>
Pattern = PATTERN
tmp.log
<?xml version="1.0" encoding="UTF-8" PATTERN-line1 <upd_date>2016-03-24</upd_date><upd_time>00:01:00.200</upd_time> blah-blah-blah </update><?xml version="1.0" encoding="UTF-8" blah-blah-blah-line1 <upd_date>2016-03-24</upd_date><upd_time>00:01:00.200</upd_time> blah-blah-blah </update>
<?xml version="1.0" encoding="UTF-8" PATTERN-line2 <upd_date>2016-03-24</upd_date><upd_time>00:01:00.200</upd_time> blah-blah-blah </update>
<?xml version="1.0" encoding="UTF-8" blah-blah-blah-line3 <upd_date>2016-03-24</upd_date><upd_time>00:01:00.200</upd_time> blah-blah-blah </update>
<?xml version="1.0" encoding="UTF-8" PATTERN-line4 <upd_date>2016-03-24</upd_date>
<upd_time>00:01:00.200</upd_time> blah-blah-blah-line5 </update>
<?xml version="1.0" encoding="UTF-8" blah-blah-blah-line6 <upd_date>2016-03-24</upd_date>
<upd_time>00:01:00.200</upd_time> blah-blah-blah-line7 </update>
Expected output:
<?xml version="1.0" encoding="UTF-8" blah-blah-blah-line1 <upd_date>2016-03-24</upd_date><upd_time>00:01:00.200</upd_time> blah-blah-blah </update>
<?xml version="1.0" encoding="UTF-8" blah-blah-blah-line3 <upd_date>2016-03-24</upd_date><upd_time>00:01:00.200</upd_time> blah-blah-blah </update>
<?xml version="1.0" encoding="UTF-8" blah-blah-blah-line6 <upd_date>2016-03-24</upd_date>
<upd_time>00:01:00.200</upd_time> blah-blah-blah-line7 </update>
I`ve tried (without full success) different combinations like the following:
sed -i "s#<?xml.*PATTERN.*</update>##g" tmp.log
sed -i "#<?xml#{p; :a; N; #</update>#!ba; s#.*\n##}; p" tmp.log
perl -pi -e 's/<?xml.*PATTERN.*update>//' tmp.log
As far as I can see, these remove whole lines and skip the case when substrings are located on different lines. I also do not perform real checking for PATTERN here. Any help appreciated.