27

Consider a file on the internet (like this one (note the s in https) https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls

How can the sheet 2 of the file be read into R?

The following code is approximation of what is desired (but fails)

url1<-'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls'
p1f <- tempfile()
download.file(url1, p1f, mode="wb")
p1<-read_excel(path = p1f, sheet = 2)
Jaap
  • 81,064
  • 34
  • 182
  • 193
userJT
  • 11,486
  • 20
  • 77
  • 88
  • 1
    per this link even download.file() should not be necessary but I can't make it work. https://github.com/hadley/readxl/pull/77 – userJT Dec 28 '16 at 19:47
  • 3
    I don't think `read_excel` is capable of handling Excel workbook files that do not have a `.xls` extension. – IRTFM Dec 28 '16 at 20:12

4 Answers4

27

This works for me on Windows:

library(readxl)
library(httr)
packageVersion("readxl")
# [1] ‘0.1.1’

GET(url1, write_disk(tf <- tempfile(fileext = ".xls")))
df <- read_excel(tf, 2L)
str(df)
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 20131 obs. of  8 variables:
# $ Code                        : chr  "C115388" "C115800" "C115801" "C115802" ...
# $ Codelist Code               : chr  NA "C115388" "C115388" "C115388" ...
# $ Codelist Extensible (Yes/No): chr  "No" NA NA NA ...
# $ Codelist Name               : chr  "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" "6 Minute Walk Functional Test Test Code" ...
# $ CDISC Submission Value      : chr  "SIXMW1TC" "SIXMW101" "SIXMW102" "SIXMW103" ...
# $ CDISC Synonym(s)            : chr  "6 Minute Walk Functional Test Test Code" "SIXMW1-Distance at 1 Minute" "SIXMW1-Distance at 2 Minutes" "SIXMW1-Distance at 3 Minutes" ...
# $ CDISC Definition            : chr  "6 Minute Walk Test test code." "6 Minute Walk Test - Distance at 1 minute." "6 Minute Walk Test - Distance at 2 minutes." "6 Minute Walk Test - Distance at 3 minutes." ...
# $ NCI Preferred Term          : chr  "CDISC Functional Test 6MWT Test Code Terminology" "6MWT - Distance at 1 Minute" "6MWT - Distance at 2 Minutes" "6MWT - Distance at 3 Minutes" ...
lukeA
  • 53,097
  • 5
  • 97
  • 100
14

A simpler solution is using the openxlsx package. Here is an example, which can be adapted to your needs:

library(openxlsx)
df = read.xlsx("https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ENB2012_data.xlsx",sheet=1)
Abdinardo Oliveira
  • 2,410
  • 1
  • 5
  • 12
9

From this issue on Github (#278):

some functionality for supporting more general inputs will be pulled out of readr, at which point readxl can exploit that.

So we should be able to pass urls directly to read_excel() in the (hopefully near) future.

Aurèle
  • 12,545
  • 1
  • 31
  • 49
  • 6
    As of August 2020, this issue is still open. Subsequently, `read_excel()` will not yet read `.xls` files directly from the web. – Gabriel J. Odom Aug 26 '20 at 22:52
8

use rio R package. link. Here a reprex:

library(tidyverse)
library(rio)
url <- 'https://evs.nci.nih.gov/ftp1/CDISC/SDTM/SDTM%20Terminology.xls'
rio::import(file = url,which = 2) %>% 
  glimpse()
#> 
#> Rows: 30,995
#> Columns: 8
#> $ Code                           <chr> "C141663", "C141706", "C141707"...
#> $ `Codelist Code`                <chr> NA, "C141663", "C141663", "C141...
#> $ `Codelist Extensible (Yes/No)` <chr> "No", NA, NA, NA, "No", NA, NA,...
#> $ `Codelist Name`                <chr> "4 Stair Ascend Functional Test...
#> $ `CDISC Submission Value`       <chr> "A4STR1TC", "A4STR101", "A4STR1...
#> $ `CDISC Synonym(s)`             <chr> "4 Stair Ascend Functional Test...
#> $ `CDISC Definition`             <chr> "4 Stair Ascend test code.", "4...
#> $ `NCI Preferred Term`           <chr> "CDISC Functional Test 4 Stair ...
avallecam
  • 669
  • 8
  • 8
  • Works like a dream! Thanks :) – panuffel Jan 14 '21 at 16:53
  • I noted that when in you install `rio` package, it also installs `openxlsx`, which apparently it uses to open xls from URL. Hence I upvoted the answer with openxlsx not with rio. It seems also much faster with openxls! – IVIM Jul 05 '23 at 02:28
  • fair enough @IVIM I agree with that criterion. `rio`, as a wrapper, is my last resource when I can not read sth. If I need performance, better to use a more specific one. thanks for sharing! – avallecam Jul 18 '23 at 19:13