0

Google news is searchable by keyword and then that search can be narrowed down to a certain time period.

I tried doing the search on the website and then using the url of the results page to reverse engineer the search in python thus:

import urllib2


url = 'https://www.google.com/search?hl=en&gl=uk&tbm=nws&authuser=0&q=apple&oq=apple&gs_l=news-cc.3..43j0l9j43i53.5710.6848.0.7058.5.4.0.1.1.0.66.230.4.4.0...0.0...1ac.1.SRcIeXL5d48'

handler = urllib2.urlopen(url)
html = handler.read()

however, i get a 403 error. This method works with other websites, such as bbc.co.uk. so obviously google does not want me to scrape the website with python.

so i have two questions: 1) is it possible to bypass this restriction google has placed? if so, how? 2) are there any other scrapeable news sites where i can search for news on a keyword for a given period.

for either of the options, i don't mind using a paid service. so such suggestions are welcome too.

thanks in advance, K.

user3353185
  • 127
  • 1
  • 12
  • 1
    you can also use selenium to browse through google news, and use urllib to get information from individual links. Selenium and phantom JS or selenium and the chromedriver work perfectly to browse through google news with python – Vamshi Jun 06 '17 at 17:14
  • Answer to the similar question about third-party Google News API with code example that sets date range of news: https://stackoverflow.com/a/61015947/1291371 – ilyazub Apr 03 '20 at 16:51

1 Answers1

2

Try setting User-Agent

req = urllib2.Request(path)
req.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.3 Gecko/2008092417 Firefox/3.0.3')
response = urllib2.urlopen(req)
Mihai Maruseac
  • 20,967
  • 7
  • 57
  • 109