18

Now that the whole world is clambering to use SSL all the time (a decision that makes a lot of sense) some of us who have used github and related services to store csv files have a little bit of a challenge. The read.csv() function does not support SSL when reading from a URL. To get around this I'm doing a little dance I like to call the SSL kabuki dance. I grab the text file with RCurl, write it to a temp file, then read it with read.csv(). Is there a smoother way of doing this? Better work-arounds?

Here's a simple example of the SSL kabuki:

require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
temporaryFile <- tempfile()
con <- file(temporaryFile, open = "w")
cat(myCsv, file = con) 
close(con)

read.csv(temporaryFile)
JD Long
  • 59,675
  • 58
  • 202
  • 294

6 Answers6

14

No need to write it to a file - just use textConnection()

require(RCurl)
myCsv <- getURL("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
WhatJDwants <- read.csv(textConnection(myCsv))
Sean
  • 3,765
  • 3
  • 26
  • 48
12

Using Dirk's advice to explore method="" resulted in this slightly more concise approach which does not depend on the external RCurl package.

temporaryFile <- tempfile()
download.file("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv",destfile=temporaryFile, method="curl")
read.csv(temporaryFile)

But it appears that I can't just set options("download.file.method"="curl")

JD Long
  • 59,675
  • 58
  • 202
  • 294
8

Yes -- see help(download.file) which is pointed to by read.csv() and all its cousins. The method= argument there has:

method Method to be used for downloading files. Currently download methods "internal", "wget", "curl" and "lynx" are available, and there is a value "auto": see ‘Details’. The method can also be set through the option "download.file.method": see options().

and you then use this option to options():

download.file.method: Method to be used for download.file. Currently download methods "internal", "wget" and "lynx" are available. There is no default for this option, when method = "auto" is chosen: see download.file.

to turn to the external program curl, rather than the RCurl package.

Edit: Looks like I was half-right and half-wrong. read.csv() et al do not use the selected method, one needs to manually employ download.file() (which then uses curl or other selected methods). Other functions that do use download.file() (such as package installation or updates) will profit from setting the option, but for JD's initial query concerning csv files over https, an explicit download.file() is needed before read.csv() of the downloaded file.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • 1
    The help page for download.file says "https:// connections are not supported". Are you saying that specifying options(download.file.method="curl") will cure that problem? – IRTFM Nov 08 '10 at 18:42
  • Yes, as R will then 'farm out' to curl rather than using its own minimal http/ftp client code. – Dirk Eddelbuettel Nov 08 '10 at 18:46
  • @DWin that help page states the line you quote refers only to `method = "internal"`. – Gavin Simpson Nov 08 '10 at 18:58
  • Why in the world `download.file(testData, tmpFile, mode = 'wb', method = 'curl')`, where `testData` is a 4MB+ zip file, results in a 24KB file? – Aleksandr Blekh Aug 07 '14 at 03:08
  • 1
    Never mind, figured it out: by default, GitHub truncates large files in a repository. Full file is available by adding `?raw=true` to the URL. – Aleksandr Blekh Aug 07 '14 at 03:22
  • Note that if you get the unsupported connection error, make sure that "curl" is installed on the system. –  Jan 12 '15 at 03:28
6

R core should open up the R connections as a C API. I've proposed this in the past:

https://stat.ethz.ch/pipermail/r-devel/2006-October/043056.html

with no response.

Jeff
  • 1,426
  • 8
  • 19
  • Very true and an issue we better get resolved one day, but not strictly speaking related the the question here, is it? ;-) – Dirk Eddelbuettel Nov 08 '10 at 16:47
  • 3
    Yes, it's related, because one can make an https ssl connection using the proposed Connections API. That way, one could use url("https://..."), etc. – Jeff Nov 08 '10 at 17:42
2

Given that this question comes up a lot, I've been working on a package to seamlessly handle HTTPS/SSL data. The package is called rio. A version of it is on CRAN but the newest version that now supports this is only available on GitHub. Once you've installed the package, you can read in data in one line:

# install and load rio
library("devtools")
install_github("leeper/rio")
library("rio")

# import
import("https://gist.github.com/raw/667867/c47ec2d72801cfd84c6320e1fe37055ffe600c87/test.csv")
##   a b
## 1 1 2
## 2 2 3
## 3 3 4
## 4 4 5

Basically, import handles the manual download (using curl) and then infers the file format from the file extension, thus creating a dataframe without needing to know what function to use or how to download it.

Thomas
  • 43,637
  • 12
  • 109
  • 140
0

I found that since Dropbox changed the way that they present links with https:// none of the above solutions work any more. Fortunately, I wasn't the first to make this discovery, and a solution was posted by Christopher Gandrud on r-bloggers:

http://www.r-bloggers.com/dropbox-r-data/

That approach works for me, after installing the repmis package and its dependencies.