0

I try to get the source-code of a webpage with python and the library urllib unfortunately if my URL is not secured (https://) I get an error 401:

File "C:\Python27\ArcGIS10.1\lib\urllib.py", line 379, in http_error_default
raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 401, 'Unauthorized', <httplib.HTTPMessage instance at 0x031279E0>) 

For example this code give me this error:

import urllib
url = 'http://python.org/'
req = urllib.urlopen(url)

but this code work fine:

import urllib
url = 'https://python.org/'
req = urllib.urlopen(url)

I can access to this website without any problems through my browser but I'm unable to get the source code via python.

Any Ideas ?

obchardon
  • 10,614
  • 1
  • 17
  • 33
  • 1
    The error message suggests that the website's server is denying you access, and it's not a problem with your code. I can run the same code and get the expected result. Your IP address might be blocked/blacklisted. – Underyx Jul 26 '16 at 12:23
  • Do you have a proxy? – Laur Ivan Jul 26 '16 at 12:23
  • yes i have a proxy, so there is nothing to do ? No way to go through this problem ? – obchardon Jul 26 '16 at 12:24
  • Not a problem with the headers ? – HolyDanna Jul 26 '16 at 12:24
  • With your edit regarding HTTPS working, I'm now suspecting that something might be interfering with your connections on your local network, or your proxy's side, blocking requests to that website. Could you try opening the same site in your browser, making sure that you're using the same proxy and not using HTTPS? – Underyx Jul 26 '16 at 12:25
  • Via the browser I can access to both http and https website. – obchardon Jul 26 '16 at 12:28
  • 1
    In that case my best guess is that your proxy is blocking requests that seem to have been made by a program. The urllib module by default sends HTTP requests with a header that specifies that the request was made from Python with urllib. HTTPS encrypts this information, so it makes sense that the proxy can't know it's not a human making the request. The solution is to not use a proxy, or to use another one, since it's obviously against your current proxy's terms of use to run a script making requests through them. – Underyx Jul 26 '16 at 12:33

0 Answers0