I have installed Tor + Privoxy
on my server and they're working fine! (Tested).
But now when I try to use urllib2 (python)
to scrape google shopping results, using proxy of course, I always get blocked by google (sometimes 503 error, sometimes 403 error). So anyone have any solutions can help me avoid that problem? It would be very appreciated!!
The source code that I am using:
_HEADERS = {
'User-Agent': 'Mozilla/5.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding': 'deflate',
'Connection': 'close',
'DNT': '1'
}
request = urllib2.Request("https://www.google.com/#q=iphone+5&tbm=shop", headers=self._HEADERS)
proxy_support = urllib2.ProxyHandler({"http" : "127.0.0.1:8118"})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
try:
response = urllib2.urlopen(request)
html = response.read()
print html
except urllib2.HTTPError as e:
print e.code
print e.reason
Note that: When I don't use proxy, it can work fine!