Search text matching a pattern inside an XML tag

Question

I have a file which contains XML tags. Each line has a root element and a couple of sub elements into it. The structure resembles something like this

<document><title>some title1</title><abstract>Some abstract1</abstract></document>
<document><title>some title2</title><abstract>Some abstract2</abstract></document>
<document><title>some title3</title><abstract>Some abstract3</abstract></document>
<document><title>some title4</title><abstract>Some abstract4</abstract></document>

Now I have to find all lines where the tag contains a particular word. eg: get all lines that contain abstract1 inside the <abstract> tag.

How to do it in either grep, awk or sed?

Yes, or something like just `abstract1`, but it should be present inside the tag — Sudar, Mar 20 '13 at 05:23
Seems like a mention of [this famous question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) is probably in order. — Daniel Lyons, Mar 20 '13 at 05:24

score 3 · Accepted Answer · answered Mar 20 '13 at 05:24

3

Using sed:

sed -n '/<abstract>[^<]*abstract1/p' input

answered Mar 20 '13 at 05:24

perreal

94,503
21
155
181

bhab · Answer 2 · 2013-03-20T05:33:10.307

1

Update:

    grep  -nir  "<abstract>.*word.*</abstract>" filename

edited Mar 20 '13 at 05:33

answered Mar 20 '13 at 05:22

bhab

169
2
11

This will give me all lines that contain "your word". But I want to find only lines that contain "your word" inside a particular tag like . – Sudar Mar 20 '13 at 05:23
The updated code works. But I have already accepted the other answer. So I was able to only upvote. – Sudar Mar 20 '13 at 15:45

Search text matching a pattern inside an XML tag

2 Answers2