-2

I'm trying to get the page title from XML Feeds.

I'm using http://feeds.gawker.com/lifehacker/full as an example and using the below code works with other sites but for Lifehacker it seems to ignore the closing </title> tag and console.log shows the entire content of the xml feed from after the opening <title>

function getTitle($Url){
        $str = file_get_contents($Url);
        if(strlen($str)>0){
            preg_match("/\<title\>(.*)<\/title\>/",$str,$title);
            return $title[1];
        }
    }

$feed = 'http://feeds.gawker.com/lifehacker/full';
$pagetitle = getTitle($feed);

Thanks

ngplayground
  • 20,365
  • 36
  • 94
  • 173

2 Answers2

1

Don't use regex for parsing XML or HTML pages. Try this instead. Simpler and neater:

$feed = simplexml_load_file('feed.xml');

var_dump((string)$feed->channel->title);
silkfire
  • 24,585
  • 15
  • 82
  • 105
0

Personally I would recommend against using regular expression for parsing XML documents. It's simply not suited for that.

Instead have a look at SimpleXML or DOM

Now, what is wrong with your regular expression is that the star is greedy by default

preg_match("/\<title\>(.*?)<\/title\>/",$str,$title);

will get you what you are after. But keep in mind that your code will only return the first title element in the document.

More on regular expressions at this excellent reference site

http://www.regular-expressions.info/

The Mighty Rubber Duck
  • 4,388
  • 5
  • 28
  • 27