Replace string in XML from bash without writing regex

Question

I have a large flat file containing many instances of a repeated string I would like to remove:

<content type="html">
  &lt;p&gt; &lt;/p&gt;
  &lt;p&gt;Jump around on couch, meow constantly until given food.&lt;/p&gt;
  &lt;p&gt; &lt;/p&gt;
</summary>

Because you can't parse [X]HTML with regex I'm looking for a solution where I don't have to write my own regex. I tried using tr without any luck. Here's my desired output:

<content type="xhtml">

  &lt;p&gt;Jump around on couch, meow constantly until given food.&lt;/p&gt;

</summary>

How can I remove the repeating string from bash without writing regex?

since it is xml, look into https://stackoverflow.com/tags/xmlstarlet/info.. I haven't used it personally, so I don't how it can be used for this case... — Sundeep, Jul 12 '17 at 12:56

vhs · Accepted Answer · 2017-07-12T19:14:50.003

-1

I used a tool called rpl which didn't require me to write any regex:

$ rpl '&lt;p&gt; &lt;/p&gt;' '' /tmp/file

Really DELETE all occurences of &lt;p&gt; &lt;/p&gt; (case sensitive)? (Y/[N]) Y
Replacing "&lt;p&gt; &lt;/p&gt;" with "" (case sensitive) (partial words matched)
A Total of 55 matches replaced in 1 file searched.

Installed via Homebrew with brew install rpl. Finished in 2 minutes.

edited Jul 12 '17 at 19:14

answered Jul 11 '17 at 05:28

vhs

9,316
3
66
70

score -1 · Answer 2 · answered Jul 11 '17 at 11:55

-1

With the knowledge of regular expressions it would be:

sed -i.bck 's~&lt;p&gt; &lt;/p&gt;~~g' /tmp/file

answered Jul 11 '17 at 11:55

hek2mgl

152,036
28
249
266

Thanks for providing a solution. I've updated the question to try and make it more clear what I'm trying to achieve and why RegExp may not be the best approach for my needs. – vhs Jul 12 '17 at 13:13

Replace string in XML from bash without writing regex

2 Answers2