1

I am currently working to an app that requires to scrape data from Google's search results. For example google.com/search?q=domain.com and so on. But Google blocks my IP address after making some requests. I know there are Google APIs, but there are many sites around that just scrape the data directly.

halfer
  • 19,824
  • 17
  • 99
  • 186
Jimmy Thakkar
  • 549
  • 4
  • 19
  • 2
    See http://stackoverflow.com/questions/1096287/commercial-use-of-google-api I think you need to enter into commercial agreement with Google. Otherwise you may be against the T&C of many of their APIs. – Preet Sangha Oct 12 '12 at 10:20
  • 1
    Thanks Preet, but i have seen and used many sites that aren't using the google's APIs and just scrape the data from google search via using their scripts smartly. So if you know anything such, please do let me know.Thanks though, – Jimmy Thakkar Oct 12 '12 at 10:52
  • 1
    Given that Google will probably block this, it's risky to base any of your enterprises upon it - unless you can afford for them to fail. There's probably many search and scrape services, but you'll probably not get this for free from anyone, imo. – halfer Mar 18 '13 at 21:11

2 Answers2

1

Scraping Google search results is a breech of the terms-of-service. Google actively discourages such and blocks those who do. They share their information with you free of charge but they don't appreciate you trying to get a copy of all of it.

Better to do your own crawling of the domain.

Brian White
  • 8,332
  • 2
  • 43
  • 67
0

Too bad I did not see your question earlier, if it's not too late:

Scraping Google does indeed violate their terms of service, on the other hand you may choose not to accept them. You would accept their TOS when you create a Google account for example but as far as I know you can also reject the acceptance again (at least when they change them).

For a smaller amount of data you can use their API or also their commercial API but if you need the results and ranks exactly as a user will see them (SEO purposes) I know no official way to get their permission.

I am not a lawyer, so you might want to consult one if you want to make sure about legal consequences.

However, scraping Google usually does not lead to any legal problems. I remember that even Bing (Microsofts engine) got caught scraping Google for unknown keywords. That happened a few years ago. My personal guess is that the majority of their original results were copied from Google in secret.

There is an open source project http://google-rank-checker.squabbel.com which does work to scrape large amounts of Google results. As far as I remember, without modification it is limited to about 50-70k resultpages per day. I suggest to take a look at the code, it's PHP with libcURL.

You will need proper IP addresses (not shared, not previously abused) as well. Scraping with a single IP will result in getting blocked by Google within an hour. Usually the first thing that happens is a captcha, by solving the captcha you generate a cookie which allows you to keep making requests. If you continue you will get a complete ban. And if you "hammer" Google with a huge amount of requests you will alert their staff and they can put a manual ban on the whole ISP or network block.

A proper amount is around 10 requests per hour with an IP, that's what I have been sticking to on my related projects.

So if someone scrapes Google, make sure you have functions which validate the results and watch for unexpected returns. In such a case your code should immediately stop accessing Google to prevent further accessing a page which is just showing a captcha.

John
  • 7,507
  • 3
  • 52
  • 52