0

im makeing a program that uses google to search but i cant becuase of the HTTP error 403 is there any way around it or anything im using mechanize to browse here is my code

from mechanize import Browser

inp = raw_input("Enter Word: ")
Word = inp

SEARCH_PAGE = "https://www.google.com/"

browser = Browser()
browser.open( SEARCH_PAGE )
browser.select_form( nr=0 )

browser['q'] = Word
browser.submit()

here is the error message

Traceback (most recent call last):
File "C:\Python27\Project\Auth2.py", line 16, in <module>
browser.submit()
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 541, in submit
return self.open(self.click(*args, **kwds))
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 255, in _mech_open
raise response
httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt

please help and thank you

Serial
  • 7,925
  • 13
  • 52
  • 71
  • You're going to end up getting banned temporarily by google if you do this too many times. Using Google search programmatically is a pay for service provided by Custom search API ( 100 free queries per day for development) – David Apr 18 '13 at 22:48
  • This problem look awefully similar to [urllib2.HTTPError: HTTP Error 403: Forbidden](https://stackoverflow.com/questions/13303449/urllib2-httperror-http-error-403-forbidden/46213623#46213623) – Supreet Sethi Nov 06 '17 at 16:32

2 Answers2

6

You can tell Mechanize to ignore the robots.txt file:

browser.set_handle_robots(False)
Blender
  • 289,723
  • 53
  • 439
  • 496
  • 2
    now im getting this httperror_seek_wrapper: HTTP Error 403: Forbidden – Serial Apr 18 '13 at 22:26
  • @ChristianCareaga: You have to change your user agent: https://views.scraperwiki.com/run/python_mechanize_cheat_sheet/? – Blender Apr 18 '13 at 22:28
2

Mechanize tries to respect crawling limitations announced by the /robots.txt file on the site.Here, Google does not want crawlers to index its search pages.

You can ignore this limitation:

browser.set_handle_robots(False)

as stated in Web Crawler - Ignore Robots.txt file?

Also, I would recommend using Google's Custom Search API instead, which exposes a proper API with easily parseable results.

Community
  • 1
  • 1
Nicolas Cortot
  • 6,591
  • 34
  • 44