1

After receiving an answer from Montgomery Clift in another post (see here), I tried writing a function in order to loop through multiple days within a month span to collect data from Baseball Prospectus (example page here). The code successfully downloads each day's files, but then I receive the following error:

Error in list_to_dataframe(res, attr(.data, "split_labels"), .id, 
id_as_factor) : Results must be all atomic, or all data frames

The function code followed by what I'm running to try and collect all the data:

fetch_adjusted <- function(day) {
    fname <- paste0(“standings201909”, day, “.html”)
    download.file(url = 
paste0(“https://legacy.baseballprospectus.com/standings/index.php? 
odate=2019-09-“, day), destfile=fname)
    doc0 <- htmlParse(file=fname, encoding=“UTF-8”)
    doc1 <- xmlRoot(doc0)
    doc2 <- getNodeSet(doc1, “//table[@id=‘content’]”)
    standings <- readHTMLTable(doc2[[1]], header=TRUE, skip.rows=1, 
stringsAsFactors=FALSE)
    standings <- standings[[1]]
    standings$day <- day
    standings
}

Sept <- ldply(1:29, fetch_adjusted, .progress="text")

Can anyone help figure out how to adjust my current code so I can avoid any errors? Thank you!

UPDATE:

I'm now able to successfully download xls files from multiple dates within a span doing the following:

dates <- seq(as.Date("2019-09-01"), as.Date("2019-09-30"), by=1)

fetch_adjusted <- function(dates) {
 url <- 
 paste0("https://legacy.baseballprospectus.com/standings/index.php? 
 odate=", dates, "&otype=xls")
 destfile <- "test.xls"
 download.file(url, destfile, mode = "wb")
}

But now, no matter what mode I use ("w", "wb", "a") it's not appending the files so what I end up with is only the very last file (in this case, 2019-09-30), which is an empty spreadsheet. My thought is it's just overwriting the last file with the most recent every time. Is there a solution for this?

Abb
  • 109
  • 13
  • I removed the tag `PHP` from your question since it seems totally unrelated – brombeer Dec 19 '19 at 14:03
  • The problem is with this line `download.file(url, destfile, mode = "wb")` You are overwriting the same file. You can add a variable to `destfile` to make it unique.You'll end up with many files. You can then write a different function to read each file and append it to a single file. – Karthik Arumugham Dec 21 '19 at 06:31

1 Answers1

0

Per Karthik's comment above, the following did the trick:

dates <- seq(as.Date("2019-09-01"), as.Date("2019-09-30"), by=1)

fetch_adjusted <- function(dates) {
  url <- paste0("https://legacy.baseballprospectus.com/standings/index.php?odate=", dates, "&otype=xls")
  destfile <- paste0("/Desktop/Test/", dates, ".xls")
  download.file(url, destfile, mode = "wb")
 }

Sept <- ldply(dates, fetch_adjusted, .progress = "text")
Abb
  • 109
  • 13