0

I want to get google search results with python, so far I have the following script, which I learned from this post:

import urllib2
from bs4 import BeautifulSoup
import lxml
import sqlite3
import urllib
import json

def showSome(searchFor):
    query = urllib.urlencode({'q':searchFor})
    url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s'%query
    searchResponse = urllib.urlopen(url)
    searchResults = searchResponse.read()
    results = json.loads(searchResults)
    data = results['responseData']
    print 'Total results: %s'%data['cursor']['estimatedResultCount']
    hits = data['results']
    print 'Top %d hits'%len(hits)
    for h in hits:
        print ' ', h['url']

showSome("site:www.hitmeister.de/shops/")

It shows me 4380 results, when I search for the same query using browser, it gives me about 6650 results, how can I extract all results from google? And also this gives me top 4 results, how can I fetch all results?

Community
  • 1
  • 1
user873286
  • 7,799
  • 7
  • 30
  • 38

2 Answers2

2

The problem here is that Google's estimated numbers of results are always estimates, nothing more. These estimates may vary based on a number of factors, apparently including whether you are searching via the API or from a web browser. In fact, it's not unknown for Google to return different estimates when you run the same query from different browsers on the same system. This could perhaps be explained by a different server answering your query, but I doubt that, and Google is certainly known to take the search context into account.

See also this short piece and Google documentation on the subject. Although that appendix seems to have been written for Google Search Appliances specifically, it's a good description of the accuracy of these result counts.

On a practical note, Google will never return more than 1,000 hits for a query anyway, so you'll never get all results for a query, no matter the initial estimate. At least, I haven't tried requesting more than 1000 results from the API, but this is the behaviour for the web interface, and I assume the API has the same limitation.

Daan
  • 3,403
  • 23
  • 19
1

Google is very complex and not the results depend on many different parameters.

For example, if I search for a term on google.co.uk, I get different results than google.com.

This behavior can also be the same for different user-agents and cookies (e.g. because you have set a different language in your cookie).

Very important is also, that the result count is not accurate. It is just an estimation of the google search. If you want to change this behaviour, I would try to inject the same parameters via ajax, that you inject with a normal search (including cookie, etc).

Ultimately my counter-question would be: Why do you need this? This count is most of the time not accurate, because the counter is just an estimation. Much more important is the question if the top results are the same. If this is not the case, that would be a problem I think.

Dave Halter
  • 15,556
  • 13
  • 76
  • 103
  • I just want to get all links in results, how can I do this or is it possible to get all result links??? – user873286 May 07 '12 at 14:21
  • 1
    You won't get all the results with this method, you'll just get the top results. If you want to get everything, you have to iterate through the pages (check the google api for that). – Dave Halter May 07 '12 at 15:23