3

I was using the xgoogle python library for one of my projects. It was working fine till recently. I am not getting the resultset that I used to get before. If anyone who has used this library written by Peter Krummins, faced a similar situation, can you please suggest a work around ?

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
Suhas
  • 31
  • 1
  • 2

3 Answers3

4

The presence of BeautifulSoup.py hints that this library uses web scraping to get its result.

A common problem with this is that it will easily break when the design/layout of the page being scraped changes. And the problem you see seems to coincide with the new search results layout that Google introduced just recently.

Another problem is that it often is against the terms of service of the site being scraped. And according to point 5.3 of the Google Terms Of Service it actually is:

You specifically agree not to access (or attempt to access) any of the Services through any automated means (including use of scripts or web crawlers) [...]

A better idea would be to use the Custom Search API.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • Note that the Custom Search API is heavily limited (100 queries/day). –  Sep 02 '11 at 06:37
  • 1
    @duskwuff: and using web scraping is not permitted at all (see point 5.3 in their [Terms Of Service](http://www.google.com/accounts/TOS). – Joachim Sauer Sep 02 '11 at 06:39
  • thanks a lot !! The custom api is very very restricted though... is there any other way I can achieve this. the custom api gives me only 32 results at a time at the max !! – Suhas Sep 02 '11 at 08:38
1

Peter Krumin's product xgoogle looks to be extremely useful both to me and I image many others. https://github.com/pkrumins/xgoogle

For me the current version is 1.3 is not working. I tried a new install from GitHub, ran the examples and nothing is returned.

Adding a debugger to the source code and tracing the data captured in a query to its disappearance the problem occurs in a routine called search.py subroutine "_extract_results" at a parser command

results = soup.findAll('li', {'class': 'g'})

The soup object has material in it but the "findAll" fails to return anything.

Looks like its searching for lists and if there are none it returns nothing. I am unsure what html you would try to match to get a result. If anyone knows how to make the is work I am very interested.

Keir
  • 557
  • 1
  • 6
  • 17
0

A little more googling and it appears xgoogle is no longer supported or works. Part of the trouble is that Google changes the layout of its results pages every so often and so any scraping software that assumes some standard layout is in time doomed to failure.

There are however other search engines that are locally installed and thus provide a results layout that are less likely change with upgrades and will not change at all if you don't upgrade.

I am currently investigating Yacy. Easy to install and can be pointed at specific sites if you want.

Keir
  • 557
  • 1
  • 6
  • 17