10

I'm using R in a commercial environment where external connectivity all goes via a web proxy, so we need to specify the proxy server address and ensure we connect to it with Windows authentication.

I already have code that will configure the RCurl and httr packages to use those settings by default - i.e.

httr::set_config(config(
  proxy = "my.proxy.address", 
  proxyuserpwd = ":", 
  proxyauth = 4
   ))

or

opts <- list(
  proxy = "my.proxy.address",
  proxyuserpwd = ":", 
  proxyauth = 4
)
RCurl::options(RCurlOptions = opts)

However, in a couple of cases recently, I've found packages that depend on the curl package to make web requests - for instance xml2::read_xml - and I can't find any way to set the same proxy options so they're picked up by default and used by curl.

If I use curl directly myself, I can set the options on a new handle and the following code is sufficient to work successfully:

  h = new_handle(proxy = "my.proxy.address",
                 proxyuserpwd = ":")
  con = curl(url,handle = h)
  page = xml2::read_xml(con)

... but this isn't any help when the use of curl is buried within someone else's function!

Alternatively, I know I can set up an environment variable for the proxy address, like this:

Sys.setenv(https_proxy = "https://my.proxy.address")

... and libcurl picks it up. But if I do just this, then I end up with an HTTP 407 proxy authentication error. Is there a way to specify blank username / password (as the proxyuserpwd setting does), so we authenticate with Windows credentials? It also doesn't seem possible to specify the proxyauth option as an environment variable.

Can anyone offer a solution or any suggestions, please?

djb72
  • 101
  • 1
  • 4

2 Answers2

3

I was having similar issues. Here are the steps that worked for me:

  1. Download my company's proxy auto-config file (PAC file). For IE: click the gear icon --> internet options --> Connections --> LAN Settings --> copy the http address into a new browser window to download the text file.
  2. Locate the line in the PAC file specifying the proxy (eg: "auth-proxy.xxxxxxx.com:9999")
  3. In a new R session, test these proxy settings by temporarily setting them with a command similar to the following, substituting your values from your PAC file:

    Sys.setenv(http_proxy = "auth-proxy.xxxxxxx.com:9999")
    Sys.setenv(https_proxy = "auth-proxy.xxxxxxx.com:9999")
    
  4. Rerun your code in the same session to see if these new settings solve the issue. This is the test I used.

    read_html(curl('http://google.com', handle = curl::new_handle("useragent" = "Mozilla/5.0")))
    

Setting the proxy using Sys.setenv will only persist to the end of your current session. To make a more permanent change you may consider adding this to your R_PROFILE as explained here.

Stan
  • 905
  • 9
  • 20
  • This does not work for me. I keep getting the 407. This seems to work, but it does not help as I didn't find a way to set the "PROXYUSERPWD" as a default: `handle = new_handle() handle_setopt(handle, .list = list(PROXYUSERPWD = ":")) curl_download("http://orf.at", "test.html", handle = handle)` – Roman Jun 04 '21 at 13:30
  • @Roman - I do not have a proxy with a password to test this on, but from the help file: https://stat.ethz.ch/R-manual/R-patched/library/utils/html/download.file.html Usernames and passwords can be set for HTTP proxy transfers via environment variable http_proxy_user in the form user:passwd. Alternatively, http_proxy can be of the form http://user:pass@proxy.dom.com:8080/ for compatibility with wget. Only the HTTP/1.0 basic authentication scheme is supported. Under Windows, if http_proxy_user is set to ask then a dialog box will come up for the user to enter the username and password. – Stan Aug 06 '21 at 23:07
2

This is not a beauty, actually it is quite the opposite. But still this hack made it work for me:

library(curl)

new_handle_plain = curl::new_handle

new_handle_ntlm = function(){
  handle = new_handle_plain()
  handle_setopt(handle, .list = list(PROXYUSERPWD = ":"))
  return(handle)
}

rlang::env_unlock(env = asNamespace('curl'))
rlang::env_binding_unlock(env = asNamespace('curl'))
assign('new_handle', new_handle_ntlm, envir = asNamespace('curl'))
rlang::env_binding_lock(env = asNamespace('curl'))
rlang::env_lock(asNamespace('curl'))


Sys.setenv("http_proxy" = curl::ie_get_proxy_for_url("http://orf.at"))
Sys.setenv("https_proxy" = curl::ie_get_proxy_for_url("http://orf.at"))
curl_download("http://orf.at", "test.html")

I would still love to see a clean solution as changing an inner function of a library is not something one should do...

Roman
  • 315
  • 3
  • 11