I have a webpage that connects to an external site and tries to save some of the information in its RSS feed into MySQL, every time I visit this webpage. The problem is that this site updates its RSS feed daily and so if one day I forget to visit my webpage, the information from the RSS from the external site is lost. Is there a way to retrieve or find the RSS from yesterday if a website daily updates its RSS feed?
-
An RSS feed doesn't typically erase previous (days) articles. Does the URL contain some parameter that defines you only want "today's" articles. – Mark W Aug 24 '15 at 11:35
-
http://export.arxiv.org/rss/astro-ph.IM This is an example of an RSS I am interested in. – user3741635 Aug 24 '15 at 11:44
2 Answers
The problem here is that the feed exemple you give doesn't include pubDate as a sub-element of each item... wich normaly helps RSS readers to detect new items. And the feed have a date fault.
If you were (you are perhaps) in an autopublishing process (i.e. RSS to WordPress) you could employ a Cron Job from your web server where you could say... "verify if there's a new item and if yes get it".
My coding skills are not sufficient, sorry, to explain how (on a site I manage, a plugin does this task).

- 304
- 1
- 5
Your approach is probably the very first step :) But as you've already identified it creates issues like the need to load your reader quite often to make sure you never miss data). It's also quite slow (and impossible to scale efficiently once you start having hundreds of RSS feeds...).
You could check this question and my answer there.
TL;DR : run a cronjob daily/hourly to make sure don't miss updates. Then, implement things like PubSubHubbub so you know about the feed updates as they happen, rather than poll the feeds :)

- 1
- 1

- 31,046
- 20
- 66
- 93