1

I want to make ruby script to print the number of total results on google when searched using a queries like allinurl: http://www.example.net/Downloads.aspx?Doc=

I went through the source code of the page and made the following ruby script

require "rubygems"
require "rest-client"

url="https://www.google.com.np/search?q=allinurl:+http://www.dpsmathuraroad.net/Downloads.aspx%3FDoc%3D&lr=&safe=active&hl=en&noj=1&biw=1366&bih=643&filter=0"
intel=RestClient.get(url)

xfile=File.open("dpsmathuraroad.txt","w")
xfile.write(intel.body)
xfile.close

xfile2=File.open("dpsmathuraroad.txt", "r")
while !xfile2.eof?
    ch=xfile2.readline
    if ch=~ /<div id="resultStats">About /
        break
    end
end
dat=ch.split(/[<div id="sbfrm_l"><div id="resultStats">About , results<nobr> ]/)
puts dat[1]
gets

the line dat=ch.split(/[<div id="sbfrm_l"><div id="resultStats">About , results<nobr> ]/) in the code above is pure manipulation from the source code of the page.

BUT UNFORTUNATELY GOOGLE DOES HUMAN CHALLENGE AND A CAPTCHA THUS INTERFERES.

How do I get past the interfering captcha and get the desired result with such ruby scripts? Can it be done using some APIs?

Rishav
  • 3,818
  • 1
  • 31
  • 49

2 Answers2

1

You can't. That's exactly why Captchas exist. Scraping of any kind violates Google's terms of service, and they use Captchas to enforce that.

Sorry.

Jon Cairns
  • 11,783
  • 4
  • 39
  • 66
  • I've heard that it can be done using google search api. If not then please tell me. – Rishav Jun 02 '16 at 11:01
  • @Rishav see http://stackoverflow.com/questions/4082966/what-are-the-alternatives-now-that-the-google-web-search-api-has-been-deprecated – Jon Cairns Jun 02 '16 at 11:11
1

If you don't mind breaking their terms of service, there are APIs for Captcha solving. These are often used in results scrapers, such as Serposcope.

For example anti-captcha.

L Martin
  • 1,180
  • 8
  • 18