0

I want to extract top 50 results from google search and get the title and snippet for each search result. I am using the following code.

#!/usr/bin/python3
import json
import urllib.request, urllib.parse

def showsome(searchfor):
   query = urllib.parse.urlencode({'q': searchfor})
   url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
   search_response = urllib.request.urlopen(url)
   search_results = search_response.read().decode("utf8")
   results = json.loads(search_results)
   data = results['responseData']
   print('Total results: %s' % data['cursor']['estimatedResultCount'])
   print(data['results'])
   hits = data['results']
   print('Top %d hits:' % len(hits))
   print(hits)

for h in hits:
    print(' ',h['title'])
print(' ', h['url'])

showsome('jaguar')

But I am only getting 4 results. ie the results before the image search comes on search results page. Can someone please suggest a better method for this task. It will be better if you can give a genereal way which can work on other search engines also like say yahoo.com

ronilp
  • 455
  • 2
  • 9
  • 25

1 Answers1

0

As stated here, that API has been deprecated. It still seems operational, but I wouldn't rely on it remaining in service. You should look for an alternative API.

Nevertheless, the default number of results per query is 4. The minimum is 1, the maximum is 8, and this can be set with the rst query parameter, i.e. append &rst=8 to get 8 results per query.

You will need to make additional queries to retrieve more results. The first result is specified with the start query parameter, e.g. &start=4 will return results from the 4th onward. You can use results['responseData']['cursor'] to give you a mapping of page numbers to start offsets, e.g:

>>> pprint(results['responseData']['cursor'])
{'currentPageIndex': 0,
 'estimatedResultCount': '29600000',
 'moreResultsUrl': 'http://www.google.com/search?oe=utf8&ie=utf8&source=uds&start=0&hl=en&q=jaguar',
 'pages': [{'label': 1, 'start': '0'},
           {'label': 2, 'start': '4'},
           {'label': 3, 'start': '8'},
           {'label': 4, 'start': '12'},
           {'label': 5, 'start': '16'},
           {'label': 6, 'start': '20'},
           {'label': 7, 'start': '24'},
           {'label': 8, 'start': '28'}],
 'resultCount': '29,600,000',
 'searchResultTime': '0.19'}

Details can be found in the linked documentation, see the section entitled "Standard URL Arguments".

Yahoo's API will be different (I expect), so this method will not work there.

mhawke
  • 84,695
  • 9
  • 117
  • 138
  • Go to google and search. I found this: https://developers.google.com/custom-search/json-api/v1/overview , but I have not used it, so I can't recommend it either way. There is some discussion [here](http://stackoverflow.com/q/4082966/21945). Some recommend Yahoo's API. – mhawke Apr 04 '15 at 13:06