1

I have an URL for the site but not for it's feed(s), which I want to parse.

How to detect if site has an RSS/Atom feed(s)?

Shmidt
  • 16,436
  • 18
  • 88
  • 136

2 Answers2

1

As mentioned in this question How to check if a site has rss feeds you need to download the page and check for a rel='alternate'.

You could also have a fallback to regex the page for any mention of a feed.xml or similar, should the first parse fail, if you want to be sure to find any possible link to a RSS/Atom. This would not be as certain to contain the RSS of the actual page, it could be a link to an outside RSS.

Community
  • 1
  • 1
1

Look for the rel="alternate" type="application/rss+xml" link in the head section of the site's defalut page:

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
  <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://someurl/feed/" />
  <title>Some title</title>
</head>
Dan Byström
  • 9,067
  • 5
  • 38
  • 68
  • Can you please tell how to download default page? Is it always index.htm? – Shmidt Jul 22 '12 at 10:12
  • That would be the page you get when you, for example, try to download the page http://stackoverflow.com/ over port 80. It may have any name, since that is usually configurable in the web server. – Dan Byström Jul 22 '12 at 10:40
  • In addition, for ATOM check for type="application/atom+xml" – Henley Oct 27 '12 at 21:25