13

It's Friday afternoon, and my brain has frozen!

grep -E -m 1 -o "<title>(.*)</title>" myfile.rss

returns

<title>Some title</title>

How do I just get Some title?

tdc
  • 8,219
  • 11
  • 41
  • 63
  • 1
    Bash don't have a build-in function to parse XML. You can consider to use PHP, perl to parse XML in proper manner. Then getting the node value will be easy – ajreal Nov 25 '11 at 15:07
  • Nah, that's complete overkill for the task! – tdc Nov 25 '11 at 15:09
  • 2
    For what I understand we only want to extract title content from xml of known layout, not parse xml. For parsing xml `xmlstarlet` may be useful utility. – Michael Krelin - hacker Nov 25 '11 at 15:10
  • There is not perfect way for XML parsing using pure bash commands. – ajreal Nov 25 '11 at 15:10
  • 3
    The point is that the OP doesn't need a perfect way. And not talking about bash builtins either. `grep` is no bash builtin. – Michael Krelin - hacker Nov 25 '11 at 15:13
  • Quite right @Michael, all I needed was to pull out one tag from the XML to generate filenames in a script. It's working now! – tdc Nov 25 '11 at 15:38

1 Answers1

24

pipe it further through, for instance

sed -e 's,.*<title>\([^<]*\)</title>.*,\1,g'
Michael Krelin - hacker
  • 138,757
  • 24
  • 193
  • 173
  • Thanks! Brain thawing out ;-) – tdc Nov 25 '11 at 15:10
  • +1, but note that using `sed` to parse XML (or HTML) isn't generally a good idea. It should be done only when the input is well known and doesn't vary unexpectedly. For anything slurped automatically from the internet a proper parser should be used. – sorpigal Nov 25 '11 at 16:34
  • @Sorpigal, I agree completely, see comments to the question itself for details. – Michael Krelin - hacker Nov 25 '11 at 19:50