38

I would like to read online data to R using download.file() as shown below.

URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(URL, destfile = "./data/data.csv", method="curl")

Someone suggested to me that I add the line setInternet2(TRUE), but it still doesn't work.

The error I get is:

Warning messages:
1: running command 'curl  "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"  -o "./data/data.csv"' had status 127 
2: In download.file(URL, destfile = "./data/data.csv", method = "curl",  :
  download had nonzero exit status

Appreciate your help.

k-dubs
  • 29
  • 7
useR
  • 3,062
  • 10
  • 51
  • 66

10 Answers10

48

It might be easiest to try the RCurl package. Install the package and try the following:

# install.packages("RCurl")
library(RCurl)
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
## Or 
## x <- getURL(URL, ssl.verifypeer = FALSE)
out <- read.csv(textConnection(x))
head(out[1:6])
#   RT SERIALNO DIVISION PUMA REGION ST
# 1  H      186        8  700      4 16
# 2  H      306        8  700      4 16
# 3  H      395        8  100      4 16
# 4  H      506        8  700      4 16
# 5  H      835        8  800      4 16
# 6  H      989        8  700      4 16
dim(out)
# [1] 6496  188

download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv",destfile="reviews.csv",method="libcurl")
m0nhawk
  • 22,980
  • 9
  • 45
  • 73
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • Hi Ananda. But i got error message when using getURL(URL) as belows > x <- getURL(URL) Error in function (type, msg, asError = TRUE) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed – useR Apr 13 '14 at 00:02
  • 4
    @Yin, you could try adding `ssl.verifypeer = FALSE` to the `getURL` statement. – A5C1D2H2I1M1N2O1R2T1 Apr 13 '14 at 03:17
  • @Yin, if the answer was helpful do consider up-voting it. If it solved your problem, do consider accepting it. – A5C1D2H2I1M1N2O1R2T1 Apr 13 '14 at 09:49
  • It's not the solution! status 127 means "command not found". S/he just needs to install CURL! – Muktadir Jun 18 '14 at 23:15
  • @Muktadir, and if you refer to the comments under the question, you would see that I asked them that already. – A5C1D2H2I1M1N2O1R2T1 Jun 19 '14 at 15:17
20

Here's an update as of Nov 2014. I find that setting method='curl' did the trick for me (while method='auto', does not).

For example:

# does not work
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip')

# does not work. this appears to be the default anyway
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip', method='auto')

# works!
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip', method='curl')
arvi1000
  • 9,393
  • 2
  • 42
  • 52
  • i am getting curl not found. – Arun Raja Nov 09 '15 at 04:48
  • Maybe you don't have curl on your system then. On Mac OS at least, you can run `system('curl -V')` in R (must be capital 'V') to check your curl version – arvi1000 Nov 09 '15 at 15:44
  • https://lehd.ces.census.gov/data/lodes/LODES7/ut/wac/ut_wac_S000_JT00_2013.csv.gz # does not work for me install.packages("RCurl") library(RCurl) download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip', destfile='localfile.zip', method='curl') #-------Resulting errors-------- Warning messages: 1: running command 'curl "https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip" -o "localfile.zip"' had status 127 2: In download.file(url = "https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip", :download had nonzero exit status – Mox Jan 19 '17 at 19:04
6

I've succeed with the following code:

url = "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x = read.csv(file=url)

Note that I've changed the protocol from https to http, since the first one doesn't seem to be supported in R.

Baumann
  • 1,119
  • 11
  • 20
  • 3
    The problem with this "solution" is that not all https urls can be substituted with http. The "RCurl" package typically does a good job with a lot of these situations. – A5C1D2H2I1M1N2O1R2T1 Apr 13 '14 at 09:51
  • 3
    That's not a solution to the problem. Workarounds should only be considered when you are not able to solve the problem. – Muktadir Jun 18 '14 at 23:12
  • 2
    This solves the problem and doesn't require installing external dependencies or messing with SSL certificates. It may not work in all cases, but it works in this one. – bonh May 14 '17 at 17:31
4

If using RCurl you get an SSL error on the GetURL() function then set these options before GetURL(). This will set the CurlSSL settings globally.

The extended code:

install.packages("RCurl")
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))   
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)

Worked for me on Windows 7 64-bit using R3.1.0!

m0nhawk
  • 22,980
  • 9
  • 45
  • 73
4

Offering the curl package as an alternative that I found to be reliable when extracting large files from an online database. In a recent project, I had to download 120 files from an online database and found it to half the transfer times and to be much more reliable than download.file.

#install.packages("curl")
library(curl)
#install.packages("RCurl")
library(RCurl)

ptm <- proc.time()
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
proc.time() - ptm
ptm

ptm1 <- proc.time()
curl_download(url =URL ,destfile="TEST.CSV",quiet=FALSE, mode="wb")
proc.time() - ptm1
ptm1

ptm2 <- proc.time()
y = download.file(URL, destfile = "./data/data.csv", method="curl")
proc.time() - ptm2
ptm2

In this case, rough timing on your URL showed no consistent difference in transfer times. In my application, using curl_download in a script to select and download 120 files from a website decreased my transfer times from 2000 seconds per file to 1000 seconds and increased the reliability from 50% to 2 failures in 120 files. The script is posted in my answer to a question I asked earlier, see .

  • I was downloading `ts` format files and `curl_download` was able to download the file correctly without corruption compared to `getURL` and `download.file` – KKW Oct 30 '22 at 02:11
3

Try following with heavy files

library(data.table)
URL <- "http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- fread(URL)
zyduss
  • 91
  • 2
2

127 means command not found

In your case, curl command was not found. Therefore it means, curl was not found.

You need to install/reinstall CURL. That's all. Get latest version for your OS from http://curl.haxx.se/download.html

Close RStudio before installation.

Muktadir
  • 115
  • 8
2

Had exactly the same problem as UseR (original question), I'm also using windows 7. I tried all proposed solutions and they didn't work.

I resolved the problem doing as follows:

  1. Using RStudio instead of R console.

  2. Actualising the version of R (from 3.1.0 to 3.1.1) so that the library RCurl runs OK on it. (I'm using now R3.1.1 32bit although my system is 64bit).

  3. I typed the URL address as https (secure connection) and with / instead of backslashes \\.

  4. Setting method = "auto".

It works for me now. You should see the message:

Content type 'text/csv; charset=utf-8' length 9294 bytes
opened URL
downloaded 9294 by
Kevin Panko
  • 8,356
  • 19
  • 50
  • 61
JeromeROD
  • 21
  • 1
1

You can set global options and try-

options('download.file.method'='curl')
download.file(URL, destfile = "./data/data.csv", method="auto")

For issue refer to link- https://stat.ethz.ch/pipermail/bioconductor/2011-February/037723.html

akshat thakar
  • 1,445
  • 21
  • 29
1

Downloading files through the httr-package also works:

URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"

httr::GET(URL,
          httr::write_disk(path = basename(URL),
                           overwrite = TRUE))
andschar
  • 3,504
  • 2
  • 27
  • 35