4

I am trying to use feedparser to get RSS feeds from reuters.com. I visited the link http://in.reuters.com/tools/rss to get the rss feeds' links, for example http://feeds.reuters.com/reuters/INtopNews. When I implement the code in python using feedparser, I get around 10 posts only, as follows:

import feedparser

feeds = feedparser.parse('http://feeds.reuters.com/Reuters/worldNews')

for feed in feeds['entries']:
    print(feed['title'])

But if I visualise the same link on www.feedreader.com, I can find many more posts when I attach a link and scroll down. How do I get all these RSS posts in feedparser in python?

psr
  • 2,619
  • 4
  • 32
  • 57
  • What you're asking for might be here https://stackoverflow.com/questions/28683619/how-can-i-parse-multiple-urls-in-feedparser-python?rq=1 – Nick Duddy Aug 07 '17 at 13:58

2 Answers2

3

The reason you only get 10 items from a Reuters feed is because that's all the feed contains. Most RSS feeds only have the most recent items, not all items going back in time. The feedparser library reads what is currently in the feed.

The Reuters feed in your code example contains 10 items.

When an RSS reader such as Feedreader shows more items than that as you scroll down, that's because the reader saves old items which are no longer in the feed. It's typical for web-based RSS readers to archive items in this manner.

rcade
  • 204
  • 1
  • 11
0

As rcade mentioned most RSS feed covers just the most recent items however it is possible to collect it daily (even hourly) and use it. If you want to something like that you can use Python rssarchive library from here: https://pypi.org/project/rssarchive/

#!/usr/bin/env python
import rssarchive as ra
newra  = ra.RssArchive(CONFIG_TEST_MODE=True,CONFIG_FULL_TEXT_MODE = False)
newra.batch_save_rss()
Suat Atan PhD
  • 1,152
  • 13
  • 27