2

I am in the sticky situation where I cannot use the pacparser library, and I was hoping someone had a pure python solution (no C/c++ modules).

I have a PAC file that has multiple proxies returned:

function FindProxyForURL(url,host)
{
if (isPlainHostName(host))
{ return "DIRECT"; }

#// Internal network Hosts 
if (isInNet(host, "158.232.0.0", "255.255.0.0") || isInNet(host, "127.0.0.1", "255.255.255.255")|| isInNet(host, "10.0.0.0", "255.0.0.0"))
{ return "DIRECT"; }

#// Connect through proxy server for all other hosts. If proxy server is not available, connect directly
return "PROXY proxy.site.com:3128; PROXY proxy02.site.com:3128; PROXY proxy05.site.com:3128; PROXY proxy03.site.com:3128; PROXY proxy04.site.com:3128";

}

How can I parse this using python only and what is the best way to tell which proxy is up?

Thank you and for your consideration to the academy! :)

Martin Tournoij
  • 26,737
  • 24
  • 105
  • 146
code base 5000
  • 3,812
  • 13
  • 44
  • 73
  • 1
    A PAC file is Javascript with some extra built-in functions, and to properly parse it, you need to execute the Javascript (`pacparser` seems to use the SpiderMonkey library)... I don't think an all-Python solution exists, and making one yourself is not trivial... Why can't you use C modules? – Martin Tournoij Nov 13 '14 at 14:50
  • 1
    Could you use regular expression to read out the PROXIES? Any suggestions on how I could use re to do this? I stink at regular expressions. It's a site issue thing, where I can't install C modules without jumping through hoops. – code base 5000 Nov 13 '14 at 16:57
  • 1
    Well, you could do that, of course, but what if the PAC scripts has 2 `return PROXY ...` statements? Which one to use? Or what if it only wants you to use the proxy in a very specific case; let's say, only if the domain is `stackoverflow.com`, and won't work for other domains? There are many more examples where a regexp will fail you... You don't need to "install" a C module by the way, you can just install them locally (possibly compiled on another machine) – Martin Tournoij Nov 13 '14 at 17:32

1 Answers1

2

I've created a pure-Python library called PyPAC which should do what you're looking for. It provides a subclass of requests.Session that includes honours PACs and includes PAC auto-discovery.

Internally, it uses Js2Py, a Python library that parses and executes JavaScript. PyPAC then has Python implementations of the various JavaScript functions required by the PAC specification.

Carson Lam
  • 313
  • 3
  • 9