I'd like to download a series of pdf files from my intranet. I'm able to see the files in my web browser without issue, but when trying to automate the pulling of the file via python, I run into problems. After talking through the proxy set up at my office, I can download files from the internet quite easily with this answer:
url = 'http://www.sample.com/fileiwanttodownload.pdf'
user = 'username'
pswd = 'password'
proxy_ip = '12.345.56.78:80'
proxy_url = 'http://' + user + ':' + pswd + '@' + proxy_ip
proxy_support = urllib2.ProxyHandler({"http":proxy_url})
opener = urllib2.build_opener(proxy_support,urllib2.HTTPHandler)
urllib2.install_opener(opener)
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
f.close()
but for whatever reason it won't work if the url is pointing to something on my intranet. The following error is returned:
Traceback (most recent call last):
File "<ipython-input-13-a055d9eaf05e>", line 1, in <module>
runfile('C:/softwaredev/python/pdfwrite.py', wdir='C:/softwaredev/python')
File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 585, in runfile
execfile(filename, namespace)
File "C:/softwaredev/python/pdfwrite.py", line 26, in <module>
u = urllib2.urlopen(url)
File "C:\Anaconda\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Anaconda\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Anaconda\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Anaconda\lib\urllib2.py", line 442, in error
result = self._call_chain(*args)
File "C:\Anaconda\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Anaconda\lib\urllib2.py", line 629, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Anaconda\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Anaconda\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Anaconda\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Anaconda\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Anaconda\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: Service Unavailable
Using requests.py
in the following code, I can successfully pull down files from the internet, but when trying to pull a pdf from my office intranet, I just get a connection error sent back to me in html. The following code is run:
import requests
url = 'www.intranet.sample.com/?layout=attachment&cfapp=26&attachmentid=57142'
proxies = {
"http": "http://12.345.67.89:80",
"https": "http://12.345.67.89:80"
}
local_filename = 'test.pdf'
r = requests.get(url, proxies=proxies, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
print chunk
if chunk:
f.write(chunk)
f.flush()
And the html that comes back:
Network Error (tcp_error)
A communication error occurred: "No route to host"
The Web Server may be down, too busy, or experiencing other problems preventing it from responding to requests. You may wish to try again at a later time.
For assistance, contact your network support team.
Is it possible that there be some network security setting that prevents automated requests outside the web browser environment?