How To Discover RSS Feeds for a given URL

Question

I get a URL from a user. I need to know:
a) is the URL a valid RSS feed?
b) if not is there a valid feed associated with that URL

using PHP/Javascript or something similar

(Ex. http://techcrunch.com fails a), but b) would return their RSS feed)

score 20 · Accepted Answer · edited Aug 21 '11 at 11:21

20

Found something that I wanted:

Google's AJAX Feed API has a load feed and lookup feed function (Docs here).

a) Load feed provides the feed (and feed status) in JSON

b) Lookup feed provides the RSS feed for a given URL

Theres also a find feed function that searches for RSS feeds based on a keyword.

Planning to use this with JQuery's $.getJSON

edited Aug 21 '11 at 11:21

Marek Grzenkowicz

17,024
9
81
111

answered Sep 14 '08 at 18:45

Gilean

14,708
10
45
52

Too bad you have to use Google Feeds API for that. RSS has a simple discovery mechanism based on elements in the section. It's very easy to implement and will remove one dependency on Google if you do that. – Julien Genestoux Jun 01 '14 at 20:08

score 10 · Answer 2 · edited Jul 24 '14 at 11:27

10

The Zend Feed class of the Zend-framework can automatically parse a webpage and list the available feeds.

Example:

$feedArray = Zend_Feed::findFeeds('http://www.example.com/news.html');

edited Jul 24 '14 at 11:27

Farzad

842
2
9
26

answered Sep 15 '08 at 11:49

ConroyP · Answer 3 · 2008-09-14T19:27:20.513

This link will allow you to validate the link against the RSS/Atom specifications using the W3C specs, but does require you to manually enter the url.

There are a number of ways to do this programmatically, depending on your choice of language - in PHP, parsing the file as valid XML is a good way to start, then compare it to the relevant DTD.

For b), if the link itself isn't a feed, you can parse it and look for a specified feed in the <head> section of the page, searching for a link whose type is "application/rss+xml", e.g:

<link rel="alternate" title="RSS Feed" 
    href="http://www.example.com/rss-feed.xml" type="application/rss+xml" />

This type of link is the one used by most browsers to "auto-discover" feeds (causing the RSS icon to appear in your address bar)

score 5 · Answer 4 · answered Sep 14 '08 at 18:34

5

a) Retrieve it and try to parse it. If you can parse it, it's valid.

b) Test if it's an HTML document (server sent text/html) MIME-type. If so, run it through an HTML parser and look for <link> elements with RSS feed relations.

answered Sep 14 '08 at 18:34

John Millikin

197,344
39
212
226

score 4 · Answer 5 · answered Sep 16 '08 at 12:46

For Perl, there is Feed::Find , which does automate the discovery of syndication feeds from the webpage. The usage is quite simplicistic:

use Feed::Find;
my @feeds = Feed::Find->find('http://example.com/');

It first tries the link tags and then scans the a tags for files named .rss and something like that.

score 2 · Answer 6 · answered Sep 14 '08 at 18:35

Are you doing this in a specific language, or do you just want details about the RSS specification?

In general, look for the XML prolog:

<?xml version="1.0" encoding="UTF-8"?>

followed by an <rss> element, but you might want to validate it as XML, fully validate it against a DTD, or verify that - for example, each URL referred to is valid, etc. More detail would help.

UPDATE: Ah - PHP. I've found this library to be pretty useful: MagpieRSS

How To Discover RSS Feeds for a given URL

6 Answers6

Linked

Related