download file from intranet with python

Question

I'd like to download a series of pdf files from my intranet. I'm able to see the files in my web browser without issue, but when trying to automate the pulling of the file via python, I run into problems. After talking through the proxy set up at my office, I can download files from the internet quite easily with this answer:

url = 'http://www.sample.com/fileiwanttodownload.pdf'

user = 'username'
pswd = 'password'
proxy_ip = '12.345.56.78:80'
proxy_url = 'http://' + user + ':' + pswd + '@' + proxy_ip
proxy_support = urllib2.ProxyHandler({"http":proxy_url})
opener = urllib2.build_opener(proxy_support,urllib2.HTTPHandler)
urllib2.install_opener(opener)

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
f.close()

but for whatever reason it won't work if the url is pointing to something on my intranet. The following error is returned:

Traceback (most recent call last):

  File "<ipython-input-13-a055d9eaf05e>", line 1, in <module>
    runfile('C:/softwaredev/python/pdfwrite.py', wdir='C:/softwaredev/python')

  File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 585, in runfile
    execfile(filename, namespace)

  File "C:/softwaredev/python/pdfwrite.py", line 26, in <module>
    u = urllib2.urlopen(url)

  File "C:\Anaconda\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)

  File "C:\Anaconda\lib\urllib2.py", line 410, in open
    response = meth(req, response)

  File "C:\Anaconda\lib\urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)

  File "C:\Anaconda\lib\urllib2.py", line 442, in error
    result = self._call_chain(*args)

  File "C:\Anaconda\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)

  File "C:\Anaconda\lib\urllib2.py", line 629, in http_error_302
    return self.parent.open(new, timeout=req.timeout)

  File "C:\Anaconda\lib\urllib2.py", line 410, in open
    response = meth(req, response)

  File "C:\Anaconda\lib\urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)

  File "C:\Anaconda\lib\urllib2.py", line 448, in error
    return self._call_chain(*args)

  File "C:\Anaconda\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)

  File "C:\Anaconda\lib\urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)

HTTPError: Service Unavailable

Using requests.py in the following code, I can successfully pull down files from the internet, but when trying to pull a pdf from my office intranet, I just get a connection error sent back to me in html. The following code is run:

import requests

url = 'www.intranet.sample.com/?layout=attachment&cfapp=26&attachmentid=57142'

proxies = {
  "http": "http://12.345.67.89:80",
  "https": "http://12.345.67.89:80"
}

local_filename = 'test.pdf'
r = requests.get(url, proxies=proxies, stream=True)
with open(local_filename, 'wb') as f:
    for chunk in r.iter_content(chunk_size=1024): 
        print chunk
        if chunk:
            f.write(chunk)
            f.flush()

And the html that comes back:

Network Error (tcp_error) 

A communication error occurred: "No route to host"
The Web Server may be down, too busy, or experiencing other problems preventing it from responding to requests. You may wish to try again at a later time.

For assistance, contact your network support team.

Is it possible that there be some network security setting that prevents automated requests outside the web browser environment?

possible duplicate of [How to download large file in python with requests.py?](http://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py) — Philip Massey, Jul 01 '14 at 22:29
@PhilipMassey Still no luck running requests. Edited original post. — thomastodon, Jul 01 '14 at 23:10
`HTTPError: Service Unavailable` - Can you access this file with web browser ? — furas, Jul 02 '14 at 03:15
@furas yes, the file can be accessed directly via the web browser — thomastodon, Jul 02 '14 at 15:39

score 1 · Answer 1 · answered Jul 02 '14 at 18:14

1

Installing openers into urllib2 doesn't affect requests. You need to use requests' own support for proxies. It should be enough to pass them in the proxies argument to get, or you can set the HTTP_PROXY and HTTPS_PROXY environment variables. See http://docs.python-requests.org/en/latest/user/advanced/#proxies

import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "http://10.10.1.10:1080",
}

requests.get("http://example.org", proxies=proxies)

answered Jul 02 '14 at 18:14

asmeurer

86,894
26
169
240

edited original post. still no luck however in pulling down the pdf. new error. – thomastodon Jul 08 '14 at 16:13
You should probably open a new question for that. Perhaps setting the client so that it looks like a browser will fix it. – asmeurer Jul 11 '14 at 16:52

score 0 · Answer 2 · answered Jul 02 '14 at 14:03

Have you tried not using the proxy to download your files when it's on the intranet?

You could try something like this in python2

from urllib2 import urlopen

url = 'http://intranet/myfile.pdf'

with open(local_filename, 'wb') as f:
    f.write(urlopen(url).read())

download file from intranet with python

2 Answers2