26

Official Statements

In the past the base R download.file() was unable to work with HTTPS protocols and it was necessary to use RCurl. Since R 3.3.0:

All builds have support for https: URLs in the default methods for download.file(), url() and code making use of them. Unfortunately that cannot guarantee that any particular https: URL can be accessed. ... Different access methods may allow different protocols or use private certificate bundles ...

The download.file() help still says:

Contributed package 'RCurl' provides more comprehensive facilities to download from URLs.

which (by the way includes cookies and headers management).

Based on RCurl FAQ (look for "When I try to interact with a URL via https, I get an error"), HTTPS URLs can be managed with:

getURL(url, cainfo="CA bundle")

where CA bundle is the path to a certificate authority bundle file. One such a bundle is available from the curl site itself:
https://curl.haxx.se/ca/cacert.pem

Current status

Tests are based on Windows platforms

For many HTTPS websites download.file() works as stated:

download.file(url="https://www.google.com", destfile="google.html")
download.file(url="https://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")

As regards RCurl, using the cacert.pem bundle, downloaded above, one might get an error:

library(RCurl)
getURL("https://www.google.com", cainfo = "cacert.pem")    
# Error in function (type, msg, asError = TRUE)  : 
#   SSL certificate problem: unable to get local issuer certificate

In this instance, simply removing the reference to the certificate bundle solves the problem:

getURL("https://www.google.com")                      # works
getURL("https://www.google.com", ssl.verifypeer=TRUE) # works

ssl.verifypeer = TRUE is used to be sure that success is not due to getURL() suppressing security. The argument is documented in RCurl FAQ.

However, in other instances, the connection fails:

getURL("https://curl.haxx.se/ca/cacert.pem")
# Error in function (type, msg, asError = TRUE)  : 
#  error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version

And similarly, using the previously downloaded bundle:

getURL("https://curl.haxx.se/ca/cacert.pem", cainfo = "cacert.pem")
# Error in function (type, msg, asError = TRUE)  : 
#   error:1407742E:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version

The same error happens even when suppressing the security:

getURL("https://curl.haxx.se/ca/cacert.pem", ssl.verifypeer=FALSE)
# same error as above

Questions

  1. How to use HTTPS properly in RCurl?
  2. As regards mere file downloads (no headers, cookies, etc.), is there any benefit in using RCurl instead of download.file()?
  3. Is RCurl become obsolete and should we opt for curl?

Update

The issue persists as of R version 3.4.1 (2017-06-30) under Windows 10.

antonio
  • 10,629
  • 13
  • 68
  • 136
  • 1
    Winston Chang (writer of R Graphics Cookbook) has a repository called [downloader](https://github.com/wch/downloader) for downloading files with HTTPS that has no external dependencies which is advertised as an alternative to RCurl. – ctesta01 Jul 12 '17 at 18:43
  • 1
    @ctesta01: Actually none of the RCurl features is found in `downloader`. In fact, the latter is just a "a wrapper for the `download.file` function". Note also that the package is rather old: in the past perhaps, a wrapper could help to abstract platform specific download issues, but currently `download.file` is optimised for the platform, so there is no practical point in using downloader. – antonio Jul 14 '17 at 12:24
  • @antonio are these sporadic failures or do they fail consistently for you? I've tried each of the above scenarios several times, on R 3.3.3 (MacOS) and R 3.3.2 (Linux), both with RCurl 1.95-4.8, and I have yet to see a failure. – anthonyserious Aug 02 '17 at 12:59
  • @anthonyserious: Thanks for feedback. Please, see update section. – antonio Aug 04 '17 at 11:57

1 Answers1

2

openssl bundled with RCurl is a bit old currently, which does not support the TLS v1.2

Yes, curl package is OK

Or you can use httr package which is a wrapper for the curl package

> library("httr")
> GET("https://curl.haxx.se/ca/cacert.pem",config(sslversion=6,ssl_verifypeer=1))
Response [https://curl.haxx.se/ca/cacert.pem]
  Date: 2017-08-16 17:07
  Status: 200
  Content-Type: application/x-pem-file
  Size: 256 kB
<BINARY BODY>
Satie
  • 116
  • 1
  • 3
  • As a package RCurl is _relatively_ [recent (2016-03-01)](https://cran.r-project.org/web/packages/RCurl/index.html). Should the bundled openssl be older than this, this is like to say that RCurl is unmantained and should not be on CRAN. – antonio Sep 23 '17 at 20:30