3

There are lots of good examples out there on how to read Microsoft Excel files into R with the XLConnect package, but I can't find any examples of how to read in an Excel file directly from a URL. The reproducible example below returns a "FileNotFoundException (Java)". But, I know the file exists because I can pull it up directly by pasting the URL into a browser.

fname <- "https://www.misoenergy.org/Library/Repository/Market%20Reports/20140610_sr_nd_is.xls"
sheet <- c("Sheet1")
data  <- readWorksheetFromFile(fname, sheet, header=TRUE, startRow=11, startCol=2, endCol=13)

Although, the URL is prefixed with "https:" it is a public file that does not require a username or password.

I have tried to download the file first using download.file(fname, destfile="test.xls") and got a message that says it was downloaded but when I try to open it in Excel to check to see if it was successful i get a Excel popup box that says "..found unreadable content in 'test.xls'.

Below are the specifics of my system:

Computer: 64-bit Dell running Operating System: Windows 7 Professional R version: R-3.1.0

Any assistance would be greatly appreciated.

MikeTP
  • 7,716
  • 16
  • 44
  • 57
  • I think you need to download the file first. – rrs Jun 11 '14 at 14:45
  • I tried download.file(fname, destfile="test.xls") and I got a message that says it was downloaded but when I try to open it in Excel to check Exel says "..found unreadable content in 'test.xls'. Which is odd, because I can open it directly by pasting the URL into a browser. – MikeTP Jun 11 '14 at 19:04
  • yeah, i tried that too. not sure what's going on. – rrs Jun 11 '14 at 19:17
  • It seems like the issue is with the actual Excel file, not your code - a quick Google search turns up other people with similar Java error messages with corrupted Office files or out of date Java. I know it's not as nice as doing it programmatically but can you just copy the data and put in a CSV file? cran.r-project.org says: "The first piece of advice is to avoid doing so if possible! If you have access to Excel, export the data you want from Excel in tab-delimited or comma-separated form, and use read.delim or read.csv to import it into R." – Meaghan Fitzgerald Jun 25 '14 at 16:30
  • Thank you for your response. Copy-paste to CSV file is not practical in this situation as one of my main goals is to automate the collection and analysis of this daily time series. It's just simply not productive to do it manually every day. – MikeTP Jun 25 '14 at 20:07
  • Downloaded the file w download.file() w/o any problems. Opens in Excel 2010 ... Does show an error when trying to access it w XLConnect "Error: IOException (Java): Cannot remove block[ 2555904 ]; out of range[ 0 - 161 ]" So regarding the download and open w Excel part I guess that this seems has to do w your local (PC) settings. Regarding the XLConnect error: I converted it to .xlsx and accessed the data via package "openxlsx". The conversion - I did manually in this case - can be run automated via e.g. a vbs script called from within R that changes the extensions from .xls to .xlsx. – GWD Jul 01 '14 at 14:28

4 Answers4

2

Two things:

  1. Try using a different package--I know the gdata package's read.xls function has support for URLs

  2. Try loading in a publicly-available xls file to make sure it's not an issue with the particular website.

For instance, you can try:

library("gdata")
site <- "http://www.econ.yale.edu/~shiller/data/chapt26.xls"
data  <- read.xls(site, header=FALSE, skip=8)
head(data)
Steve S
  • 1,023
  • 7
  • 19
  • Thank you for your reply. Your example worked perfectly with your data but when I tried to modify it to read my data I get a different error. Despite having a "https:" URL prefix my data is publicly available but I suspect the "https:" URL prefix is part of the problem. XLConnect documentation suggest that it can accommodate "https:" so I was looking for a solution or example that utilizes XLConnect. Thanks again for your reply. – MikeTP Jun 11 '14 at 16:01
  • Are you using a Mac by any chance? – Steve S Jun 11 '14 at 16:06
  • No, I am using a 64-bit Dell running Windows 7 Professional – MikeTP Jun 11 '14 at 16:29
2

XLConnect does not support importing directly from URLs. You have to use e.g. download.file first to download the file to your local machine:

require(XLConnect)
tmp = tempfile(fileext = ".xls")
download.file(url = "http://www.econ.yale.edu/~shiller/data/chapt26.xls", destfile = tmp)
readWorksheetFromFile(file = tmp, sheet = "Data", header = FALSE, startRow = 9, endRow = 151)

or with your originally proposed URL:

require(XLConnect)
tmp = tempfile(fileext = ".xls")
download.file(url = "https://www.misoenergy.org/Library/Repository/Market%20Reports/20140610_sr_nd_is.xls", destfile = tmp, method = "curl")
readWorksheetFromFile(file = tmp, sheet = "Sheet1", header = TRUE, startRow = 11, startCol = 2, endCol = 13)
Martin Studer
  • 2,213
  • 1
  • 18
  • 23
  • 1
    Martin Studer: Thank you for your reply. Even using your code (above) I did not have any success. It returned the following error: "Error: FileNotFoundException (Java): File 'file5140448e1d6.xls' could not be found - you may specify to automatically create the file if not existing." I don't really understand how to interpret that error message so perhaps it is related to my system (I am assuming you were successful in reading my URL). So maybe it is time to abandon XLConnect for this particular application and try something else. – MikeTP Jun 25 '14 at 15:13
  • @MikeTP : Perhaps you should try `download.file(url = "https://www.misoenergy.org/Library/Repository/Market%20Reports/20140610_sr_nd_is.xls", destfile = tmp, mode = "wb")` Adding the `mode` did the trick for me. – Tamas Ferenci Dec 22 '16 at 11:38
2

You can use RCurl to download the file:

library(RCurl)
library(XLConnect)
appURL <- "https://www.misoenergy.org/Library/Repository/Market%20Reports/20140610_sr_nd_is.xls"
f = CFILE("exfile.xls", mode="wb")
curlPerform(url = appURL, writedata = f@ref, ssl.verifypeer = FALSE)
close(f)
out <- readWorksheetFromFile(file = "exfile.xls", sheet = "Sheet1", header = TRUE
                      , startRow = 11, startCol = 2, endCol = 15, endRow = 35)
> head(out)
Col1 EEI Col3 IESO MHEB Col6 PJM SOCO SWPP TVA WAUE Col12 Other Total
1 Hour  1 272   NA  768 1671   NA 148  200  -52 198  280    NA   700  4185
2 Hour  2 272   NA  769 1743   NA 598  200  -29 190  267    NA   706  4716
3 Hour  3 272   NA  769 1752   NA 598  200  -28 194  267    NA   710  4734
4 Hour  4 272   NA  769 1740   NA 598  200  -26 189  266    NA   714  4722
5 Hour  5 272   NA  769 1753   NA 554  200  -27 189  270    NA   713  4693
6 Hour  6 602   NA  769 1682   NA 218  200  -32 223  286    NA   714  4662
jdharrison
  • 30,085
  • 4
  • 77
  • 89
1
library(relenium)
library(XML)
library(RCurl)

firefox=firefoxClass$new()
url="https://www.misoenergy.org/Library/Repository/Market%20Reports/20140610_sr_nd_is.xls"
url=sprintf(url)
firefox$get(url)

This will open a Firefox instance within R and ask you to download the file, which you could then open in the next line of code. I don't know of any R utilities that will open an excel spreadsheet from HTTPS.

You could then set a delay while you're saving the file and then read the sheet from your downloads folder:

Sys.sleep(10)
sheet <- c("Sheet1")
data  <- readWorksheetFromFile(path, sheet, header=TRUE, startRow=11, startCol=2, endCol=13)
kng229
  • 473
  • 5
  • 13