-1

I've completed the first couple R courses on DataCamp and in order to build up my skills I've decided to use R to prep for fantasy football this season, thus I have began playing around with the nflscrapR package.

With the nflscrapR package, one can pull Game Information using the season_games() function which simply returns a data frame with the gameID, game date, the home and away team abbreviations.

Example:

games.2012 = season_games(2012)
head(games.2012)
      GameID       date home away season
1 2012090500 2012-09-05  NYG  DAL   2012
2 2012090900 2012-09-09  CHI  IND   2012
3 2012090908 2012-09-09   KC  ATL   2012
4 2012090907 2012-09-09  CLE  PHI   2012
5 2012090906 2012-09-09   NO  WAS   2012
6 2012090905 2012-09-09  DET  STL   2012

Initially I copy and pasted the original function and changed the last digit manually for each season, then rbinded all the seasons into one data frame, games.

games.2012 <- season_games(2012)
games.2013 <- season_games(2013)
games.2014 <- season_games(2014)
games.2015 <- season_games(2015)
games = rbind(games2012,games2013,games2014,games2015)

I'd like to write a function to simplify this process. My failed attempt:

gameID <- function(years) {
  for (i in years) {
    games[i] = season_games(years[i])
  }
}

With years = list(2012, 2013) for testing purposes, produced the following:

Error in strsplit(headers, "\r\n") : non-character argument Called from: strsplit(headers, "\r\n")

Thanks in advance!

2 Answers2

2

While @Gregor has an apparent solution, he didn't run it because this wasn't a minimal example. I googled, found, and tried to use this code, and it doesn't work, at least in a non-trivial amount of time.

On the other hand, I took this code from Vivek Patil's blog.

library(XML)
weeklystats = as.data.frame(matrix(ncol = 14))  # Initializing our empty dataframe

names(weeklystats) = c("Week", "Day", "Date", "Blank",
                         "Win.Team", "At", "Lose.Team", 
                         "Points.Win", "Points.Lose", 
                         "YardsGained.Win", "Turnovers.Win", 
                         "YardsGained.Lose", "Turnovers.Lose",
                         "Year")  # Naming columns

URLpart1 = "http://www.pro-football-reference.com/years/"
URLpart3 = "/games.htm"

#### Our workhorse function ####

getData = function(URLpart1, URLpart3) {
  for (i in 2012:2015) {
    URL = paste(URLpart1, as.character(i), URLpart3, sep = "")
    tablefromURL = readHTMLTable(URL)
    table = tablefromURL[[1]]
    names(table) = c("Week", "Day", "Date", "Blank", "Win.Team", "At", "Lose.Team", 
                     "Points.Win", "Points.Lose", "YardsGained.Win", "Turnovers.Win", 
                     "YardsGained.Lose", "Turnovers.Lose")
    table$Year = i  # Inserting a value for the year 
    weeklystats = rbind(table, weeklystats)  # Appending happening here
  }
  return(weeklystats)
}

I posted this because, it works, you might learn something about web scraping you didn't know, and it runs in 11 seconds.

system.time(weeklystats <- getData(URLpart1, URLpart3))
   user  system elapsed 
  0.870   0.014  10.926 
shayaa
  • 2,787
  • 13
  • 19
  • 1
    Nice investigating! Indeed, didn't test my answer - I will happily guarantee that it works contingent on the `season_games` function working as reported in the question. – Gregor Thomas Jul 25 '16 at 23:20
  • This is great! While I'm mainly focusing on the nflscrapR package for now, I'm definitely bookmarking this to come back to explore thoroughly. Thanks again! – FabricatedSavant Jul 26 '16 at 01:55
1

You should probably take a look at some popular answers for working with lists, specifically How do I make a list of data frames? and What's the difference between [ and [[?.

There's no reason to put your years in a list. They're just integers, so just do a normal vector.

years = 2012:2015

Then we can get your function to work (we'll need to initialize an empty list before the for loop):

gameID <- function(years) {
  games = list()
  for (i in years) {
    games[[i]] = season_games(years[i])
  }
  return(games)
}

Read my link above for why we're using [[ with the list and [ with the vector. And we could run it like this:

game_list = gameID(2012:2015)

But this is such a simple function that it's easier to use lapply. Your function is just a wrapper around a for loop that returns a list, and that's precisely what lapply is too. But where your function has season_games hard-coded in, lapply can work with any function.

game_list = lapply(2012:2015, season_games)
# should be the same result as above

In either case, we have the list of data frames and want to combine it into one big data frame. The base R way is rbind with do.call, but dplyr and data.table have more efficient versions.

# pick your favorite
games = do.call(rbind, args = game_list)  # base
games = dplyr::bind_rows(game_list)
games = data.table::rbindlist(game_list)
Community
  • 1
  • 1
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Yeah... I've quickly realized I need to go back and review various sections of those introductory DataCamp courses. Thanks for your input! – FabricatedSavant Jul 26 '16 at 02:06