18

I'm trying to open a website (I am behind a corporate proxy) using urllib.request.urlopen() but I am getting the error:

urllib.error.HTTPError: HTTP Error 407: Proxy Authentication Required

I can find the proxy in urllib.request.getproxies(), but how do I specify a username and password to use for it? I couldn't find the solution in the official docs.

Lanaru
  • 9,421
  • 7
  • 38
  • 64
  • Have you seen http://stackoverflow.com/questions/34079/how-to-specify-an-authenticated-proxy-for-a-python-http-connection? Examples at the bottom of http://docs.python.org/library/urllib2.html#urllib2-examples. – Katriel Aug 01 '12 at 16:01
  • Yeah, but that's for Python2.7 Didn't they restructure the entire urllib package in Python3 3? – Lanaru Aug 01 '12 at 16:03
  • 1
    They didn't fundamentally change the interface -- just moved things around a bit. `ProxyHandler` now lives in [`urllib.request.ProxyHandler`](http://docs.python.org/release/3.0.1/library/urllib.request.html#urllib.request.ProxyHandler) – Katriel Aug 01 '12 at 16:04

2 Answers2

29
import urllib.request as req

proxy = req.ProxyHandler({'http': r'http://username:password@url:port'})
auth = req.HTTPBasicAuthHandler()
opener = req.build_opener(proxy, auth, req.HTTPHandler)
req.install_opener(opener)
conn = req.urlopen('http://google.com')
return_str = conn.read()
Lanaru
  • 9,421
  • 7
  • 38
  • 64
  • 4
    Thanks. Is there no way to do this without supplying username and password? – tommy.carstensen Feb 08 '15 at 18:26
  • 4
    If you're worried about having credentials hard-coded in your source code (and thus leaking into git or other VCS artifacts, and so on) then the best approach is to use something like configparser, or YAML or JSON, to store the credentials in their own separate file. Build the ProxyHandler URL dynamically from the config settings. This allows your sources to be readable while keeping credentials confidential. – Jim Dennis Aug 05 '15 at 19:07
  • 2
    A minor note: for me, where he has "@url:port" I actually used the machine name "@machine:port", not a full URL. – mcherm Feb 02 '16 at 15:32
  • I'm using Python 2.7, how would this look? Would I have to use urllib2? – FancyDolphin Mar 01 '16 at 23:40
3

You can set the proxy server authentication with your credentials(username and password) to connect to the website using requests. This worked for me. To get your proxy server name: use

import urllib.request as req
import os
#get your proxy server url details using below command
req.getproxies() 

#user your credentials and url to authenticate

os.environ['http_proxy'] = "http://username:pwd@url:80"
os.environ['https_proxy'] = "http://username:pwd@url:80"

#replace username, pwd and url with your credentials.

conn = req.urlopen('https://Google.com')
return_str = conn.read()
Bernad Peter
  • 504
  • 5
  • 12