0

I'm trying to extract data from the Basketball Reference website.

library(rvest)
data7 <- read_html("http://www.basketball-reference.com/teams/CLE/2017.html") %>%
html_nodes("[id=roster]") %>%
html_table()
data7

The code above returns the data in the "roster" table. However, the following code does not return the "team_misc" table but instead returns a list with legth zero:

html_nodes("[id=team_misc]") %>%

I'm fairly new to rvest so if anyone has any ideas why this does not work it would greatly be appreciated.

mmclean
  • 111
  • 1
  • 9
  • did you poke around at the _plethora_ of SO R questions scraping data from this exact same site at all? – hrbrmstr Mar 01 '17 at 14:35
  • hrbrmstr - I searched for rvest, html_nodes, html_table, etc. but didn't realize the amount of posts on Basketball Reference website. The following post may answer my question: http://stackoverflow.com/questions/41434984/readhtmltable-in-r-only-bringing-back-first-two-tables-from-basketball-reference – mmclean Mar 01 '17 at 14:45

1 Answers1

0

There is actually already an answer to this but it applies to an older version of the website.... The reason you cannot get the other tables is because they are dynamically created and when rendering the raw page in R the tables you want are in commented out strings. You should inspect-element of the page on chrome to see what I am referring to. The other answer is here How to scrape tables inside a comment tag in html with R?

But for your year data:

A <- read_html('http://www.basketball-reference.com/teams/CLE/2017.html') %>% # Read in the raw webpage
  xml_find_all('//comment()') %>% # Use xpath to find all comment nodes
  xml_text() %>% # convert to raw strings 
  paste0(collapse = "") %>% # flatten into a character vector
  read_html %>% # re-read as html content 
        xml_find_all("//table") %>% html_table

cat(capture.output(lapply(A, head, 1)), sep = "\n")


[[1]]
                   Date Type                                                                                       Note
1 Kevin Love 2017-02-12 Knee Love is expected to miss six weeks after undergoing arthroscopic surgery on his left knee.

[[2]]
            X1                X2
1 Jim Boylan   Assistant Coach

[[3]]
        G    MP   FG  FGA   FG%  3P  3PA  3P%   2P  2PA   2P%   FT  FTA   FT% ORB  DRB  TRB  AST STL BLK TOV   PF  PTS
1 Team 58 14020 2305 4938 0.467 761 1952 0.39 1544 2986 0.517 1073 1420 0.756 564 1988 2552 1304 414 237 804 1033 6444

[[4]]
   NA NA NA NA  NA  NA  NA   NA   NA   NA Advanced   NA Offense Four Factors   NA   NA     NA Defense Four Factors   NA   NA     NA               NA
1   W  L PW PL MOV SOS SRS ORtg DRtg Pace      FTr 3PAr                 eFG% TOV% ORB% FT/FGA                 eFG% TOV% DRB% FT/FGA Arena Attendance

[[5]]
  Rk              Age  G GS   MP  FG  FGA   FG%  3P 3PA   3P%  2P  2PA   2P%  eFG%  FT FTA   FT% ORB DRB TRB AST STL BLK TOV  PF PTS/G
1  1 LeBron James  32 54 54 37.5 9.6 17.7 0.541 1.7 4.4 0.387 7.9 13.3 0.592 0.589 4.8 6.9 0.691 1.1 6.7 7.9 8.9 1.4 0.6 4.3 1.7  25.7

[[6]]
  Rk              Age  G GS   MP  FG FGA   FG% 3P 3PA   3P%  2P 2PA   2P%  eFG%  FT FTA   FT% ORB DRB TRB AST STL BLK TOV PF  PTS
1  1 LeBron James  32 54 54 2026 518 957 0.541 92 238 0.387 426 719 0.592 0.589 259 375 0.691  62 363 425 479  74  32 230 92 1387

[[7]]
  Rk              Age  G GS   MP  FG FGA   FG%  3P 3PA   3P%  2P  2PA   2P%  FT FTA   FT% ORB DRB TRB AST STL BLK TOV  PF  PTS
1  1 LeBron James  32 54 54 2026 9.2  17 0.541 1.6 4.2 0.387 7.6 12.8 0.592 4.6 6.7 0.691 1.1 6.5 7.6 8.5 1.3 0.6 4.1 1.6 24.6

[[8]]
  Rk              Age  G GS   MP   FG  FGA   FG%  3P 3PA   3P%   2P  2PA   2P%  FT FTA   FT% ORB DRB  TRB  AST STL BLK TOV  PF PTS    ORtg DRtg
1  1 LeBron James  32 54 54 2026 12.7 23.4 0.541 2.3 5.8 0.387 10.4 17.6 0.592 6.3 9.2 0.691 1.5 8.9 10.4 11.7 1.8 0.8 5.6 2.3  34 NA  118  107

[[9]]
  Rk              Age  G   MP  PER   TS%  3PAr   FTr ORB% DRB% TRB% AST% STL% BLK% TOV% USG% Â  OWS DWS  WS WS/48 Â  OBPM DBPM BPM VORP
1  1 LeBron James  32 54 2026 26.3 0.618 0.249 0.392  3.5 19.1 11.6 41.7  1.8  1.3   17 29.4 NA 6.9 2.4 9.3  0.22 NA  6.3  1.8   8  5.1

[[10]]
     NA   NA   NA   NA   NA   NA                   NA   NA   NA   NA NA   NA              NA   NA   NA   NA NA   NA 2-Pt Field Goals    NA   NA 3-Pt Field Goals     NA
1  <NA> <NA> <NA> <NA> <NA> <NA> % of FGA by Distance <NA> <NA> <NA> NA <NA> FG% by Distance <NA> <NA> <NA> NA <NA>                  Dunks <NA>                  Corner
    NA     NA   NA
1 <NA> Heaves <NA>

[[11]]
  Rk                   Salary
1  1 LeBron James $30,963,450

[[12]]
                           Yr  Tm Rd Pk             Team     G  MP FG FGA   FG% 3P 3PA 3P% FT FTA   FT% ORB DRB TRB AST STL BLK TOV PF PTS
1 Vladimir Veremeenko NA 2006 WAS  2 48 NA Reggio Emilia it 18 139 17  29 0.586  0   0  NA  4   9 0.444  14  10  24   8   2   3   9 33  38
Community
  • 1
  • 1
Carl Boneri
  • 2,632
  • 1
  • 13
  • 15