5

I have a duplicated block of text I need to delete in a large xml file. I want to keep the first block and delete the second all within the same xml tag. For example:

<!--#if--> 
 -- several lines of text
<!--#else-->
-- several lines of the same text
<!--#endif-->

I'd like to delete the second block between the else and endif, and keep the keep the block between the if and else tags. Any help much appreciated - the script ends up deleting the entire file.

sed -i '/^<!--#else-->/ {p; :a; N; /^\<\!--\#endif--\>/!ba; s/*.\n//}; d' test.xml
Maroun
  • 94,125
  • 30
  • 188
  • 241
user2167052
  • 51
  • 1
  • 3
  • [Obligatory link](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). Use an XML parsing library. – Boris the Spider Jan 05 '15 at 08:18
  • What is the expected output? give a clearer example. – nu11p01n73R Jan 05 '15 at 08:20
  • The expected output is output everything outside the blocks and the text between #if and #else - i.e. I only want to delete the duplicate text between #else and #endif – user2167052 Jan 05 '15 at 08:54

1 Answers1

4

I think this should work for you

sed '/--#else--/,/--#endif--/{//!d}' test.xml

this will delete the lines between else and endif

if you want to delete else and endif as well use this:

sed '/--#else--/,/--#endif--/d' test.xml

in the case you mentioned in the comments try this:

sed -n '/--#else--/,/--#endif--/p' test.xml

-n is dont print by default and /p does the print while /!d does the delete

aelor
  • 10,892
  • 3
  • 32
  • 48
  • Thank you, I thought I had over complicated it. Running it it seems to work, but how can I capture it and output to a new file? -i.bak does not seem to work, it leaves the two files of the same size. – user2167052 Jan 05 '15 at 09:04
  • Damn, it's not working for me. If I just run the command above - as it scrolls you can see that the block is deleted. However if I try to output to a new file or use -i.bak I end up with two files of exactly the same size - nothing is deleted. I also tried cat test.xml | sed '/--#else--/,/--#endif--/{//!d}' > new.xml and I end up with two files of exactly the same size. – user2167052 Jan 05 '15 at 09:52
  • sed '/--#else--/,/--#endif--/{//!d}' deletes the lines doesnt output anything, use the last command I gave then you will get the result as output in your terminal then you can output it to a file using `>` – aelor Jan 05 '15 at 10:06
  • 1
    @user2167052 glad that works for you. You can mark the answer as correct if you feel your query has been resolved. – aelor Jan 05 '15 at 10:55
  • What if you want to delete the #else but not the #endif? – arya Apr 08 '16 at 16:31
  • @arya in that case why wont you first find the text within the else and endif block and then add endif at the end – aelor May 04 '16 at 11:55