11

R version 3.0.1 (2013-05-16) for Windows 8 knitr version 1.5 Rstudio 0.97.551

I am using knitr to do the markdown of my R code. As part of my analysis I downloaded various data sets from the web, knitr is totally fine with getting data from http sites but from https ones where it generates an unsupported URL scheme message. I know when using the download.file function on a mac the method parameter has to be set to curl to get data from an https however this doesn't help when using knitr.

What do I need to do so that knitr will gather data from Https websites?

Edit: Here is the code chunk that returns an error in Knitr but when run through R works without error.

```{r}
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy")
```
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
Jonno Bourne
  • 1,931
  • 1
  • 22
  • 45
  • possible duplicate of [Read a CSV from github into R](http://stackoverflow.com/questions/14441729/read-a-csv-from-github-into-r) – Thomas Nov 10 '13 at 14:47
  • @Thomas thank you for the suggestion however this problem is not about getting R to read a file from a website, it is about getting Knitr to express the R code that reads a data file from a website. Using windows it is not an issue accessing Https sites with R however if you write markdown code with Knitr it becomes a problem. – Jonno Bourne Nov 10 '13 at 17:41

7 Answers7

20

You could use https with download.file() function by passing "curl" to method as :

download.file(url,destination,method="curl")
Fabien Barbier
  • 1,514
  • 4
  • 28
  • 41
9

Edit (May 2016): As of R 3.3.0, download.file() should handle SSL websites automatically on all platforms, making the rest of this answer moot.

You want something like this:

library(RCurl)
data <- getURL("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv",
               ssl.verifypeer=0L, followlocation=1L)

That reads the data into memory as a single string. You'll still have to parse it into a dataset in some way. One strategy is:

writeLines(data,'temp.csv')
read.csv('temp.csv')

You can also separate out the data directly without writing to file:

read.csv(text=data)

Edit: A much easier option is actually to use the rio package:

library("rio")
import("https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv")

This will read directly from the HTTPS URL and return a data.frame.

Thomas
  • 43,637
  • 12
  • 109
  • 140
  • When I loaded the RCurl package and subsituted in your code into the Knitr markdown file then knitted it to HTML it all worked perfectly, so thank you very much! – Jonno Bourne Nov 10 '13 at 17:44
9

Use setInternet2(use = TRUE) before using the download.file() function. It works on Windows 7.

setInternet2(use = TRUE)
download.file(url, destfile = "test.csv")
Thomas
  • 43,637
  • 12
  • 109
  • 140
Renhuai
  • 536
  • 9
  • 9
5

I am sure you have already found solution to your problem by now.

I was working on an assignment right now and ended up getting the same error. I tried some of the tricks, but that did not work for me. Maybe because I am working on Windows machine.

Anyhow, I changed the link to http: rather than https: and that did the trick.

Following is chunk of my code:

if (!file.exists("./PeerAssesment2")) {dir.create("./PeerAssessment2")}
fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, dest = "./PeerAssessment2/Data.zip")

install.packages("R.utils")
library(R.utils)
if (!file.exists("./PeerAssessment2/Data")) {
    bunzip2 ("./PeerAssessment2/Data.zip", destname = "./PeerAssessment2/Data")
}
list.files("./PeerAssessment2")

noaaData <- read.csv ('./PeerAssessment2/Data')

Hope this helps.

user3694373
  • 140
  • 1
  • 9
4

I had the same issue with knitr and download.file() with a https url, on Windows 8.

You could try setInternet2(TRUE) before using the download.file() function. However I'm not sure that this fix works on Unix-like systems.

setInternet2(TRUE)  # set the R_WIN_INTERNET2 to TRUE
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download.file(fileurl, destfile = "C:/Users/xxx/yyy") # now it should work

Source : R documentation (?download.file()) :

Note that https:// URLs are only supported if --internet2 or environment variable R_WIN_INTERNET2 was set or setInternet2(TRUE) was used (to make use of Internet Explorer internals), and then only if the certificate is considered to be valid.

ndou
  • 1,048
  • 10
  • 15
1

I had the same problem with a https with the following code running perfectly in R and getting unsupported URL scheme when knitting to html:

temp = tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip", temp)
data = read.csv(unz(temp, "activity.csv"), colClasses = c("numeric", "Date", "numeric"))

I tried all the solutions posted here and nothing worked, in my absolute desperation I just eliminated the "s" in the "https" in the url and everything got fine...

user2500444
  • 111
  • 1
  • 6
1

Using the R download package takes care of the quirky details typically associated with file downloads. For you example, all you needed to do would have been:

```{r}
library(download)
fileurl <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv"
download(fileurl, destfile = "C:/Users/xxx/yyy")
```
Michael Szczepaniak
  • 1,970
  • 26
  • 35