Way around HTTP 403 with python

Question

im makeing a program that uses google to search but i cant becuase of the HTTP error 403 is there any way around it or anything im using mechanize to browse here is my code

from mechanize import Browser

inp = raw_input("Enter Word: ")
Word = inp

SEARCH_PAGE = "https://www.google.com/"

browser = Browser()
browser.open( SEARCH_PAGE )
browser.select_form( nr=0 )

browser['q'] = Word
browser.submit()

here is the error message

Traceback (most recent call last):
File "C:\Python27\Project\Auth2.py", line 16, in <module>
browser.submit()
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 541, in submit
return self.open(self.click(*args, **kwds))
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 203, in open
return self._mech_open(url, data, timeout=timeout)
File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 255, in _mech_open
raise response
httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt

please help and thank you

You're going to end up getting banned temporarily by google if you do this too many times. Using Google search programmatically is a pay for service provided by Custom search API ( 100 free queries per day for development) — David, Apr 18 '13 at 22:48
This problem look awefully similar to [urllib2.HTTPError: HTTP Error 403: Forbidden](https://stackoverflow.com/questions/13303449/urllib2-httperror-http-error-403-forbidden/46213623#46213623) — Supreet Sethi, Nov 06 '17 at 16:32

score 6 · Accepted Answer · answered Apr 18 '13 at 22:24

6

You can tell Mechanize to ignore the robots.txt file:

browser.set_handle_robots(False)

answered Apr 18 '13 at 22:24

Blender

289,723
53
439
496

2

now im getting this httperror_seek_wrapper: HTTP Error 403: Forbidden – Serial Apr 18 '13 at 22:26
@ChristianCareaga: You have to change your user agent: https://views.scraperwiki.com/run/python_mechanize_cheat_sheet/? – Blender Apr 18 '13 at 22:28

score 2 · Answer 2 · edited May 23 '17 at 12:14

Mechanize tries to respect crawling limitations announced by the /robots.txt file on the site.Here, Google does not want crawlers to index its search pages.

You can ignore this limitation:

browser.set_handle_robots(False)

as stated in Web Crawler - Ignore Robots.txt file?

Also, I would recommend using Google's Custom Search API instead, which exposes a proper API with easily parseable results.

Way around HTTP 403 with python

2 Answers2