0

I'm currently working my way through the excellent Python Challenge (http://www.pythonchallenge.com/). The current problem I'm tackling involves the use of the urllib library but I'm having issues. I'm attempting to use this library to connect to the site through my company's firewall. Let's start with some code:

proxy = {'http':'http://my.companys.proxy/proxy.pac'}
urllib.urlopen('http://www.pythonchallenge.com', proxies=proxy).read()

This yields an http response, but strangely its the Apache HTTP server test page:

...Red Hat Enterprise Linux Test Page... This page is used to test the proper operation of the Apache HTTP server after it has been installed, etc...

So, I appear to be successfully acheiving an http connection outside our firewall but getting a different http resposne than my browser. Another clue (or not) is when I try to connect to the about.php page:

urllib.urlopen('http://www.pythonchallenge.com/about.php', proxies=proxy).read()

This, however, yields:

404 Not found... Apache 2.2.3 Red Hat Server at www.pythonchallenge.com Port 80

Both addresses above work just fine in my browser (using the same proxy). Any ideas where I'm going wrong?

Chris Knight
  • 24,333
  • 24
  • 88
  • 134

1 Answers1

1

urllib does not support parsing a .pac file. The page you see is probably the Apache page for the server serving that .pac configuration file instead.

.pac files contain javascript code that present your browser with proxy rules. You can try and open the file directly and see what proxy would be configured for the Python Challenge site instead. See http://en.wikipedia.org/wiki/Proxy_auto-config for more details on the file format.

Once you figured out what proxy server would be used, configure that as server in the proxies mapping instead.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks. Makes sense. Looks like authentication is required to use the proxy specified in the .pac and one code library which might be of help (ntlmaps) is blocked from download by my company's IP filter. Pah. This is turning into a challenge indeed! – Chris Knight Mar 28 '13 at 11:05
  • If your proxy accepts HTTP BasicAuth authentication, you can add the username and password to the proxy map, see [How to specify an authenticated proxy for a python http connection?](http://stackoverflow.com/q/34079) – Martijn Pieters Mar 28 '13 at 11:08
  • My commiserations; corporate policy that blocks SourceForge, oh dear. – Martijn Pieters Mar 28 '13 at 11:25