1

The quest is, given a site url (say https://stackoverflow.com/ ) to return the list of all the feeds available on the site. Methods acceptable:

a) use a 3rd party service (google?, yahoo?, ...) programmatically b) using a crawler/spider (and some tips on how to configure the spider to return the rss/xml feeds only) c) programmatically using c/c++/php (any language/library)

The task here is not to get the feeds contained on the page returned by the url but ALL the feeds that are available on the server at any depth... in any cases please provide a simple usage example.

Zoe
  • 27,060
  • 21
  • 118
  • 148
ktolis
  • 443
  • 1
  • 3
  • 14
  • 1
    This may be a duplicate of How to discover RSS feeds for a given URL question http://stackoverflow.com/questions/61535/how-to-discover-rss-feeds-for-a-given-url with this as the answer, for finding all the RSS feeds in a variety of ways, using PHP programatically, 3rd party service etc http://stackoverflow.com/questions/61535/how-to-discover-rss-feeds-for-a-given-url/61546#61546 – Ellie Kesselman Jul 12 '11 at 19:08

1 Answers1

1

The only way I know of doing this is to depend on the RSS discovery protocol, which has beben around for about 4 years. Crawl the site, and look in the HTML pages for the RSS auto-discovery tags:

<link rel="alternate" type="application/rss+xml" 
      title="Something" 
      href="http://www.example.com/feed1.xml” />
Cheeso
  • 189,189
  • 101
  • 473
  • 713
  • It's the same idea, only, look for anchor tags. Request the home page, store any RSS tags, then look for anchor tags, request those pages, repeat. – Cheeso May 05 '10 at 00:48