0

Possible Duplicate:
How To Discover RSS Feeds for a given URL

Given a URL, I'd like to know whether it's a feed or not.

In Zend Framework, it is possible to import a URL as a feed:

try {
    $slashdotRss =
        Zend_Feed::import('http://rss.slashdot.org/Slashdot/slashdot');
} catch (Zend_Feed_Exception $e) {
    // feed import failed
    echo "Exception caught importing feed: {$e->getMessage()}\n";
    exit;
}

And if an exception is thrown, then I know the URL is not a feed.

I would like to do the same algorithm in Java, so my question is: How does Zend know whether a URL is a feed or not ?

Community
  • 1
  • 1
Majid Laissi
  • 19,188
  • 19
  • 68
  • 105

3 Answers3

1

Open the url in a browser and have a look at the source. You will notice, that it is a xml document with a specific format (it's standardized somewhere. Google for it). What the Zend Framework (note, that Zend is a company) probably does is trying to parse this document. It obviously fails, when it is not a valid feed.

KingCrunch
  • 128,817
  • 21
  • 151
  • 173
  • Zend Framework is open source, that's why I'd like to know what it does to get the document content or fail if it's not a feed.. I'm not sure that an xml document is necessarly a feed. – Majid Laissi Dec 17 '12 at 19:08
  • @jidma Thats right, but by "parsing" I don't mean, that you just look, wether it's a xml, or not, because there are many xml formats out there ;) I meant, that you should parse the document as feed and when it works, it is probably a feed. For example a rss-feed has a ``-element as its root. That is a really, _really_ good hint ;) – KingCrunch Dec 17 '12 at 19:11
  • A feed (atom/rss) is an XML document with a specific DTD/schema. – BryanH Dec 17 '12 at 19:22
  • @BryanH At least the slashdot-example doesn't include the DTD, or the schema definition either. And now, that you point the finger on it: It doesn't even have a root namespace defined O_o – KingCrunch Dec 17 '12 at 19:29
  • So even by looking inside the XML we can't figure out whether it s a feed :-/ – Majid Laissi Dec 17 '12 at 20:40
1

What I'd do is get it to rome and try to parse it. If it fails to parse, it will throw a FeedException:

public boolean tryFeed(String feedUrl) throws IOException,MalformedURLException { 
    SyndFeedInput input = new SyndFeedInput();
    SyndFeed feed = null;
    try {
        feed = input.build(new XmlReader(new URL(feedUrl)));
        return true;
    } catch (FeedException e) {
        // Feed's invalid
        return false;
    }

}
hd1
  • 33,938
  • 5
  • 80
  • 91
0

I'm not familiar with Zend's internals, however for readers I've written, I usually look for the Mime type application/rss+xml

That is the standard way of determining what a resource is.

Of course, some poorly-programmed/improperly configured sources might not adhere to the standards, just as it is possible to set the mime type for a PNG file to be text/javascript or something equally non-sensical.

As a fallback, parsing the file is a viable method, assuming the feed has been formatted properly.

BryanH
  • 5,826
  • 3
  • 34
  • 47
  • 2
    Also a good idea, but you cannot rely on it. Take the example from the question (http://rss.slashdot.org/Slashdot/slashdot): It returns `text/html` as mime-type. OK, we both know, that this is wrong, but that doesn't help ;) – KingCrunch Dec 17 '12 at 19:01
  • I have `Content-Type: text/xml;` for this feed: http://feeds2.feedburner.com/9GAG .. – Majid Laissi Dec 17 '12 at 19:06
  • @KingCrunch We can't rely on standards?! http://xkcd.com/927/ Sigh... :) – BryanH Dec 17 '12 at 19:07
  • 1
    @BryanH If we could rely on standards then we wouldn't be struggling to make JS compatible with IE.. – Majid Laissi Dec 17 '12 at 19:10
  • @BryanH It's more a "we can't rely on _every_ standard in _every_ case". The content-type is often wrong, because many sites simply forget to set it and then the webservers takes their standard settings. For example you can rely on XML (and even HTML(5)) much more, than the content-type. Beside: That comic doesn't fit in any context, where "standard" appears ;) – KingCrunch Dec 17 '12 at 19:14
  • So how would a parsing fail ? A well structured XML wouldn't fail but what in its contents shows it s an actual feed ? – Majid Laissi Dec 17 '12 at 20:42