Download a file from HTTPS using download.file()

Question

I would like to read online data to R using download.file() as shown below.

URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(URL, destfile = "./data/data.csv", method="curl")

Someone suggested to me that I add the line setInternet2(TRUE), but it still doesn't work.

The error I get is:

Warning messages:
1: running command 'curl  "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"  -o "./data/data.csv"' had status 127 
2: In download.file(URL, destfile = "./data/data.csv", method = "curl",  :
  download had nonzero exit status

Appreciate your help.

What is the problem you are seeing? Does it fail with some error or does not not return to the console at all? Does it show a progress bar which does not update? The extra information will help in diagnose the problem. — musically_ut, Apr 12 '14 at 09:51
@sgibb Hi. i am using windows 8 and 3.0.3 for R. And it said do not have the package curl. — useR, Apr 12 '14 at 23:58
You should change your title to something like - Downloading file from https in R. — Prabhu, Sep 16 '14 at 05:13

score 48 · Accepted Answer · edited Aug 07 '15 at 12:53

48

It might be easiest to try the RCurl package. Install the package and try the following:

# install.packages("RCurl")
library(RCurl)
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
## Or 
## x <- getURL(URL, ssl.verifypeer = FALSE)
out <- read.csv(textConnection(x))
head(out[1:6])
#   RT SERIALNO DIVISION PUMA REGION ST
# 1  H      186        8  700      4 16
# 2  H      306        8  700      4 16
# 3  H      395        8  100      4 16
# 4  H      506        8  700      4 16
# 5  H      835        8  800      4 16
# 6  H      989        8  700      4 16
dim(out)
# [1] 6496  188

download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv",destfile="reviews.csv",method="libcurl")

edited Aug 07 '15 at 12:53

m0nhawk

22,980
9
45
73

answered Apr 12 '14 at 10:42

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

Hi Ananda. But i got error message when using getURL(URL) as belows > x <- getURL(URL) Error in function (type, msg, asError = TRUE) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed – useR Apr 13 '14 at 00:02
4

@Yin, you could try adding `ssl.verifypeer = FALSE` to the `getURL` statement. – A5C1D2H2I1M1N2O1R2T1 Apr 13 '14 at 03:17
@Yin, if the answer was helpful do consider up-voting it. If it solved your problem, do consider accepting it. – A5C1D2H2I1M1N2O1R2T1 Apr 13 '14 at 09:49
It's not the solution! status 127 means "command not found". S/he just needs to install CURL! – Muktadir Jun 18 '14 at 23:15
@Muktadir, and if you refer to the comments under the question, you would see that I asked them that already. – A5C1D2H2I1M1N2O1R2T1 Jun 19 '14 at 15:17

arvi1000 · Answer 2 · 2015-01-14T16:04:36.630

20

Here's an update as of Nov 2014. I find that setting method='curl' did the trick for me (while method='auto', does not).

For example:

# does not work
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip')

# does not work. this appears to be the default anyway
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip', method='auto')

# works!
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip', method='curl')

edited Jan 14 '15 at 16:04

answered Nov 12 '14 at 03:17

arvi1000

9,393
2
42
52

i am getting curl not found. – Arun Raja Nov 09 '15 at 04:48
Maybe you don't have curl on your system then. On Mac OS at least, you can run `system('curl -V')` in R (must be capital 'V') to check your curl version – arvi1000 Nov 09 '15 at 15:44
https://lehd.ces.census.gov/data/lodes/LODES7/ut/wac/ut_wac_S000_JT00_2013.csv.gz # does not work for me install.packages("RCurl") library(RCurl) download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip', destfile='localfile.zip', method='curl') #-------Resulting errors-------- Warning messages: 1: running command 'curl "https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip" -o "localfile.zip"' had status 127 2: In download.file(url = "https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip", :download had nonzero exit status – Mox Jan 19 '17 at 19:04

score 6 · Answer 3 · answered Apr 12 '14 at 14:27

6

I've succeed with the following code:

url = "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x = read.csv(file=url)

Note that I've changed the protocol from https to http, since the first one doesn't seem to be supported in R.

answered Apr 12 '14 at 14:27

Baumann

1,119
11
20

3

The problem with this "solution" is that not all https urls can be substituted with http. The "RCurl" package typically does a good job with a lot of these situations. – A5C1D2H2I1M1N2O1R2T1 Apr 13 '14 at 09:51
3

That's not a solution to the problem. Workarounds should only be considered when you are not able to solve the problem. – Muktadir Jun 18 '14 at 23:12
2

This solves the problem and doesn't require installing external dependencies or messing with SSL certificates. It may not work in all cases, but it works in this one. – bonh May 14 '17 at 17:31

score 4 · Answer 4 · edited Aug 07 '15 at 12:53

4

If using RCurl you get an SSL error on the GetURL() function then set these options before GetURL(). This will set the CurlSSL settings globally.

The extended code:

install.packages("RCurl")
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))   
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)

Worked for me on Windows 7 64-bit using R3.1.0!

edited Aug 07 '15 at 12:53

m0nhawk

22,980
9
45
73

answered Jun 21 '14 at 08:49

user3762466

41
1

Can you reformat this by pressing Control+K instead of using backquotes. – bhathiya-perera Sep 29 '14 at 01:54
This is a great answer! Is there a way to set those options to be the default persisting across different R sessions? – Giuseppe Romagnuolo Nov 09 '15 at 22:53

score 4 · Answer 5 · answered Sep 21 '16 at 23:44

Offering the curl package as an alternative that I found to be reliable when extracting large files from an online database. In a recent project, I had to download 120 files from an online database and found it to half the transfer times and to be much more reliable than download.file.

#install.packages("curl")
library(curl)
#install.packages("RCurl")
library(RCurl)

ptm <- proc.time()
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
proc.time() - ptm
ptm

ptm1 <- proc.time()
curl_download(url =URL ,destfile="TEST.CSV",quiet=FALSE, mode="wb")
proc.time() - ptm1
ptm1

ptm2 <- proc.time()
y = download.file(URL, destfile = "./data/data.csv", method="curl")
proc.time() - ptm2
ptm2

In this case, rough timing on your URL showed no consistent difference in transfer times. In my application, using curl_download in a script to select and download 120 files from a website decreased my transfer times from 2000 seconds per file to 1000 seconds and increased the reliability from 50% to 2 failures in 120 files. The script is posted in my answer to a question I asked earlier, see .

I was downloading `ts` format files and `curl_download` was able to download the file correctly without corruption compared to `getURL` and `download.file` — KKW, Oct 30 '22 at 02:11

score 3 · Answer 6 · answered Apr 19 '17 at 19:28

3

Try following with heavy files

library(data.table)
URL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- fread(URL)

answered Apr 19 '17 at 19:28

zyduss

91
2

score 2 · Answer 7 · answered Jun 18 '14 at 23:10

127 means command not found

In your case, curl command was not found. Therefore it means, curl was not found.

You need to install/reinstall CURL. That's all. Get latest version for your OS from http://curl.haxx.se/download.html

Close RStudio before installation.

score 2 · Answer 8 · edited Jun 20 '18 at 17:49

Had exactly the same problem as UseR (original question), I'm also using windows 7. I tried all proposed solutions and they didn't work.

I resolved the problem doing as follows:

Using RStudio instead of R console.
Actualising the version of R (from 3.1.0 to 3.1.1) so that the library RCurl runs OK on it. (I'm using now R3.1.1 32bit although my system is 64bit).
I typed the URL address as https (secure connection) and with / instead of backslashes \\.
Setting method = "auto".

It works for me now. You should see the message:

Content type 'text/csv; charset=utf-8' length 9294 bytes
opened URL
downloaded 9294 by

score 1 · Answer 9 · answered Jun 07 '15 at 13:50

1

You can set global options and try-

options('download.file.method'='curl')
download.file(URL, destfile = "./data/data.csv", method="auto")

For issue refer to link- https://stat.ethz.ch/pipermail/bioconductor/2011-February/037723.html

answered Jun 07 '15 at 13:50

akshat thakar

1,445
21
29

score 1 · Answer 10 · answered Feb 11 '22 at 09:13

1

Downloading files through the httr-package also works:

URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"

httr::GET(URL,
          httr::write_disk(path = basename(URL),
                           overwrite = TRUE))

answered Feb 11 '22 at 09:13

andschar

3,504
2
27
35

Download a file from HTTPS using download.file()

10 Answers10

Linked