Get text inside xml tag using grep

Question

It's Friday afternoon, and my brain has frozen!

grep -E -m 1 -o "<title>(.*)</title>" myfile.rss

returns

<title>Some title</title>

How do I just get Some title?

Bash don't have a build-in function to parse XML. You can consider to use PHP, perl to parse XML in proper manner. Then getting the node value will be easy — ajreal, Nov 25 '11 at 15:07
For what I understand we only want to extract title content from xml of known layout, not parse xml. For parsing xml `xmlstarlet` may be useful utility. — Michael Krelin - hacker, Nov 25 '11 at 15:10
There is not perfect way for XML parsing using pure bash commands. — ajreal, Nov 25 '11 at 15:10
The point is that the OP doesn't need a perfect way. And not talking about bash builtins either. `grep` is no bash builtin. — Michael Krelin - hacker, Nov 25 '11 at 15:13
Quite right @Michael, all I needed was to pull out one tag from the XML to generate filenames in a script. It's working now! — tdc, Nov 25 '11 at 15:38

score 24 · Accepted Answer · answered Nov 25 '11 at 15:09

24

pipe it further through, for instance

sed -e 's,.*<title>\([^<]*\)</title>.*,\1,g'

answered Nov 25 '11 at 15:09

Thanks! Brain thawing out ;-) – tdc Nov 25 '11 at 15:10
+1, but note that using `sed` to parse XML (or HTML) isn't generally a good idea. It should be done only when the input is well known and doesn't vary unexpectedly. For anything slurped automatically from the internet a proper parser should be used. – sorpigal Nov 25 '11 at 16:34
@Sorpigal, I agree completely, see comments to the question itself for details. – Michael Krelin - hacker Nov 25 '11 at 19:50

1 Answers1