2

I am unable to use R functions that connects to another URL on the Internet (e.g. read_html, url.exists, etc) and getting time-outs etc for pages that I can connect from a browser. I believe this is because R is not using the proxy setting mandated in my office network.

I've looked at another question on setting proxy for R but my situation differs in that we use an auto-configuration script for the proxy.

I have tried setting the below

setInternet2(F)
Sys.setenv(http_proxy_user="userid:password")
Sys.setenv(http_proxy="http://myproxypac.mydomain/proxy.pac")

but it didn't work.

Anyone has suggestions on handling PACs in R?

Community
  • 1
  • 1
Ricky
  • 4,616
  • 6
  • 42
  • 72
  • I am not sure if this is what you are after, but I usualy set my proxy like this: ```library(httr); set_config(use_proxy(url = "http://yourproxy", port = proxy_port_number))``` – Miha Trošt Nov 05 '15 at 08:01
  • Nope that didn't work. I think that works for direct proxies, i.e. under Windows' LAN Settings selecting "Use a proxy server ..." instead of selecting "Use automatic configuration script", which is my situation. – Ricky Nov 05 '15 at 08:46
  • Can you post the result of `curl::ie_proxy_info()` and `curl::ie_get_proxy_for_url()` ? – Jeroen Ooms Nov 05 '15 at 10:01
  • for `ie_proxy_info` $AutoDetect [1] FALSE $AutoConfigUrl [1] "http://proxypac./proxy.pac" $Proxy NULL $ProxyBypass NULL – Ricky Nov 06 '15 at 02:10
  • for `ie_get_proxy_for_url` "proxy.:8080" – Ricky Nov 06 '15 at 02:12

1 Answers1

2

There are several internet clients available in R so it depends on what you are using.

A pac file is not a proxy server. It is just a piece of JavaScript that the client needs to execute to calculate the required proxy server for a given URL. So your code above is definitely wrong.

Companies use pac when different proxy servers are required to connect different hosts (e.g. a special intranet proxy). Have a look at the source code if your pac file to see what's going on. The curl package implements an actual PAC client in the ie_get_proxy_for_url() function. So you could wrap that to automatically find and set the correct proxy for a curl handle (see also blog):

curl_with_proxy <- function(url, verbose = TRUE){
  proxy <- ie_get_proxy_for_url(url)
  h <- new_handle(verbose = verbose, proxy = proxy)
  curl(url, handle = h)
}

And then use it like this:

con <- curl_with_proxy("https://httpbin.org/get")
readLines(con)

If it turns out your pac file simply returns proxy.<my.domain>:8080 for any URL you might be able to set in an environment variable, but this only works for libcurl based clients:

Sys.setenv(http_proxy_user = "userid:password")
Sys.setenv(http_proxy = "proxy.<my.domain>:8080")

If you can't get it to work, please describe your problem in this github issue. Perhaps your case can help us improve this part of the curl package.

Jeroen Ooms
  • 31,998
  • 35
  • 134
  • 207
  • Thanks @Jeroen. How do I find out what internet client that my R is using? When I tried your suggestion, `proxy <- ie_get_proxy_for_url(url)` threw an error `Error in ie_get_proxy_for_url(url) : ERROR_WINHTTP_UNABLE_TO_DOWNLOAD_SCRIPT` Should I add the above to the github issue at this point, or should we try out more things here first? – Ricky Nov 11 '15 at 01:36
  • Maybe move to github. Can you test if the url from `ie_proxy_info()$AutoConfigUrl` is actually valid? – Jeroen Ooms Nov 11 '15 at 10:22
  • It is, it returns the `PAC` name as listed in the browser config (i.e. `"http://myproxypac.mydomain/proxy.pac"`) – Ricky Nov 12 '15 at 02:30
  • My bad, I had a malformed value in `url` hence the earlier error; your answer now works perfectly. Thanks! – Ricky Nov 13 '15 at 01:45