I typically work with large XML files, and generally do word counts via grep
to confirm certain statistics.
For example, I want to make sure I have at least five instances of widget
in a single xml file via:
cat test.xml | grep -ic widget
Additionally, I just like to be able to log the line that widget
appears on, ie:
cat test.xml | grep -i widget > ~/log.txt
However, the key information I really need is the block of XML code that widget
appears in. An example file may look like:
<test> blah blah
blah blah blah
widget
blah blah blah
</test>
<formula>
blah
<details>
widget
</details>
</formula>
I am trying to get the following output from the sample text above, ie:
<test>widget</test>
<formula>widget</formula>
Effectively, I'm trying to get a single line with the highest level of markup tags that apply to a block of XML text/code that is surrounding the arbitrary string, widget
.
Does anyone have any suggestions for implementing this via a command-line one liner?
Thank you.