R produces "unsupported URL scheme" error when getting data from https sites

Question

R version 3.0.1 (2013-05-16) for Windows 8 knitr version 1.5 Rstudio 0.97.551

I am using knitr to do the markdown of my R code. As part of my analysis I downloaded various data sets from the web, knitr is totally fine with getting data from http sites but from https ones where it generates an unsupported URL scheme message. I know when using the download.file function on a mac the method parameter has to be set to curl to get data from an https however this doesn't help when using knitr.

What do I need to do so that knitr will gather data from Https websites?

Edit: Here is the code chunk that returns an error in Knitr but when run through R works without error.

```{r}
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy")
```

possible duplicate of [Read a CSV from github into R](http://stackoverflow.com/questions/14441729/read-a-csv-from-github-into-r) — Thomas, Nov 10 '13 at 14:47
@Thomas thank you for the suggestion however this problem is not about getting R to read a file from a website, it is about getting Knitr to express the R code that reads a data file from a website. Using windows it is not an issue accessing Https sites with R however if you write markdown code with Knitr it becomes a problem. — Jonno Bourne, Nov 10 '13 at 17:41

score 20 · Answer 1 · answered May 22 '14 at 19:40

20

You could use https with download.file() function by passing "curl" to method as :

download.file(url,destination,method="curl")

answered May 22 '14 at 19:40

Fabien Barbier

1,514
4
28
41

Thomas · Accepted Answer · 2016-05-12T10:51:04.880

Edit (May 2016): As of R 3.3.0, download.file() should handle SSL websites automatically on all platforms, making the rest of this answer moot.

You want something like this:

library(RCurl)
data <- getURL("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv",
               ssl.verifypeer=0L, followlocation=1L)

That reads the data into memory as a single string. You'll still have to parse it into a dataset in some way. One strategy is:

writeLines(data,'temp.csv')
read.csv('temp.csv')

You can also separate out the data directly without writing to file:

read.csv(text=data)

Edit: A much easier option is actually to use the rio package:

library("rio")
import("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv")

This will read directly from the HTTPS URL and return a data.frame.

When I loaded the RCurl package and subsituted in your code into the Knitr markdown file then knitted it to HTML it all worked perfectly, so thank you very much! — Jonno Bourne, Nov 10 '13 at 17:44

score 9 · Answer 3 · edited Jul 23 '15 at 13:49

9

Use setInternet2(use = TRUE) before using the download.file() function. It works on Windows 7.

setInternet2(use = TRUE)
download.file(url, destfile = "test.csv")

edited Jul 23 '15 at 13:49

Thomas

43,637
12
109
140

answered Jun 03 '14 at 05:28

Renhuai

536
9
9

score 5 · Answer 4 · answered Aug 21 '14 at 20:27

I am sure you have already found solution to your problem by now.

I was working on an assignment right now and ended up getting the same error. I tried some of the tricks, but that did not work for me. Maybe because I am working on Windows machine.

Anyhow, I changed the link to http: rather than https: and that did the trick.

Following is chunk of my code:

if (!file.exists("./PeerAssesment2")) {dir.create("./PeerAssessment2")}
fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, dest = "./PeerAssessment2/Data.zip")

install.packages("R.utils")
library(R.utils)
if (!file.exists("./PeerAssessment2/Data")) {
    bunzip2 ("./PeerAssessment2/Data.zip", destname = "./PeerAssessment2/Data")
}
list.files("./PeerAssessment2")

noaaData <- read.csv ('./PeerAssessment2/Data')

Hope this helps.

score 4 · Answer 5 · answered Nov 15 '13 at 14:30

I had the same issue with knitr and download.file() with a https url, on Windows 8.

You could try setInternet2(TRUE) before using the download.file() function. However I'm not sure that this fix works on Unix-like systems.

setInternet2(TRUE)  # set the R_WIN_INTERNET2 to TRUE
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy") # now it should work

Source : R documentation (?download.file()) :

Note that https:// URLs are only supported if --internet2 or environment variable R_WIN_INTERNET2 was set or setInternet2(TRUE) was used (to make use of Internet Explorer internals), and then only if the certificate is considered to be valid.

score 1 · Answer 6 · answered Jun 09 '14 at 20:13

I had the same problem with a https with the following code running perfectly in R and getting unsupported URL scheme when knitting to html:

temp = tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip", temp)
data = read.csv(unz(temp, "activity.csv"), colClasses = c("numeric", "Date", "numeric"))

I tried all the solutions posted here and nothing worked, in my absolute desperation I just eliminated the "s" in the "https" in the url and everything got fine...

score 1 · Answer 7 · answered Aug 29 '15 at 21:27

Using the R download package takes care of the quirky details typically associated with file downloads. For you example, all you needed to do would have been:

```{r}
library(download)
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download(fileurl, destfile = "C:/Users/xxx/yyy")
```

R produces "unsupported URL scheme" error when getting data from https sites

7 Answers7

Linked