I am trying to get Google News of about 5000 companies using python.
I have scheduled to job to run like every 12 hours.
What i actually do is using the Google news link (https://news.google.com/news/feeds?q=MyQuery&output=rss) i create a link for a company and then parse the returned XML to get the desired data.
The issue is it return result for like 500 companies per 20 minutes and give me feeds but after that it start returning me empty result. If i open up the link it has entries but during code execution it stop returning result after giving news for like 500 companies.
Now i am wondering is there a Rate limit for Google News or limit per unit of time?
Below is my code
companies = Company.objects.all() #About 6000 Companies
for company in companies:
try:
SearchQuery = company.query
SearchQuery = SearchQuery.replace(' ', '%20')
rss = "https://news.google.com/news/feeds?q="+SearchQuery+"&output=rss"
feeds = feedparser.parse(rss)
for post in feeds['entries']:
try:
url = post.link
print("RSS Entry, Link: " + url)
title = post.title
print("Inserting Article (Title): "+title)
except Exception:
exc_type, exc_value, exc_traceback = sys.exc_info()
print(repr(traceback.format_exception(exc_type, exc_value,exc_traceback)))
except Exception:
exc_type, exc_value, exc_traceback = sys.exc_info()
print(repr(traceback.format_exception(exc_type, exc_value,exc_traceback)))
Much appreciate your help.
Thanks