0

I am trying to open the .xls file from this link 'https://www.ons.gov.uk/file?uri=/economy/grossvalueaddedgva/datasets/regionalgvaibylocalauthorityintheuk/1997to2015/regionalgvaibylainuk.xls' with the package 'xlsx', but it does not seem to be working. I have tried other packages like 'gdata' unsuccessfully, too.

install.packages('xlsx')
require('xlsx')

file <- system.file('https://www.ons.gov.uk/file?uri=/economy/grossvalueaddedgva/datasets/regionalgvaibylocalauthorityintheuk/1997to2015/regionalgvaibylainuk.xls', package = "xlsx")
res <- read.xlsx(file, 5) # read the fifth sheet

I get this error: 'Error in loadWorkbook(file, password = password) : Cannot find'

I have also tried the function read.xls(), but it throws me an error, too.

Daniel
  • 471
  • 3
  • 8
  • 18
  • 1
    `system.file` is used to find a file within a package, but you are trying to read a file which is on a website, not it the xlsx package. Try downloading the file, and then reading it. – Kerry Jackson Apr 02 '19 at 12:59
  • Yes, but it has to be reproducible by other people, so I have to rely on something which everyone could have access to. – Daniel Apr 02 '19 at 13:00
  • Even if I try pasting the link on the read.xlsx function, I get this error: Error in read.xlsx("https://www.ons.gov.uk/file?uri=/economy/grossvalueaddedgva/datasets/regionalgvaibylocalauthorityintheuk/1997to2015/regionalgvaibylainuk.xls", : argument 2 matches multiple formal arguments – Daniel Apr 02 '19 at 13:01

1 Answers1

1

I think the answer from Read Excel file from a URL using the readxl package can be used here:

library(httr)
library(xlsx)

url1<-'https://www.ons.gov.uk/file?uri=/economy/grossvalueaddedgva/datasets/regionalgvaibylocalauthorityintheuk/1997to2015/regionalgvaibylainuk.xls'
GET(url1, write_disk(tf <- tempfile(fileext = ".xls")))
res <- read.xlsx(tf, 5)
Kerry Jackson
  • 1,821
  • 12
  • 20
  • Would you know how to set the first non-NA rows as your column names? – Daniel Apr 02 '19 at 13:16
  • I would say the easiest way is to use the `startRow` argument when you are reading the file, so the line would be `res <- read.xlsx(tf, 5, startRow = 3)`. This will have the structure of the data.frame determined based on the actual data. – Kerry Jackson Apr 02 '19 at 13:21