12

I have been spending a long time using R to try to scrape NBA data, so far I was doing it a little by trial and error, but finally I found this documentation. Some time ago I had some problems scraping the shotchartdetail, and I figured out the problem when I found this

This works

For that this is what I did:

shotURLtotal <- paste0("http://stats.nba.com/stats/shotchartdetail?CFID=33&CFPARAMS=2016-17&ContextFilter=&ContextMeasure=FGA&DateFrom=&DateTo=&GameID=&GameSegment=&LastNGames=0&LeagueID=00&Location=&MeasureType=Base&Month=0&OpponentTeamID=0&Outcome=&PaceAdjust=N&PerMode=PerGame&Period=0&PlayerID=0&PlusMinus=N&Position=&Rank=N&RookieYear=&Season=2016-17&SeasonSegment=&SeasonType=Regular+Season&TeamID=0&VsConference=&VsDivision=&mode=Advanced&showDetails=0&showShots=1&showZones=0&PlayerPosition=")

Season <- rjson::fromJSON(file = shotURLtotal, method="C")
Names <- Season$resultSets[[1]][[2]]

Season <- data.frame(matrix(unlist(Season$resultSets[[1]][[3]]), ncol = length(Names), byrow = TRUE))

colnames(Season) <- Names

But this does not

but when I try to do the same with the shotchartlineupdetail, and it does not work, I suspect it has to do with the CFID, which I don't know what it means, this is what I tried.

shoturl <- "http://stats.nba.com/stats/shotchartlineupdetail/?leagueId=00&season=2016-17&seasonType=Regular+Season&teamId=0&outcome=&location=&month=0&seasonSegment=&dateFrom=&dateTo=&opponentTeamId=0&vsConference=&vsDivision=&gameSegment=&period=0&lastNGames=0&gameId=&group_id=0&contextFilter=&contextMeasure=FGA"


Season <- rjson::fromJSON(file = shoturl, method="C")
Names <- Season$resultSets[[1]][[2]]

Season <- data.frame(matrix(unlist(Season$resultSets[[1]][[3]]), ncol = length(Names), byrow = TRUE))

colnames(Season) <- Names

Expected Results

The expected result should be a dataframe with the following columns:

c("GRID_TYPE", "GAME_ID", "GAME_EVENT_ID", "GROUP_ID", "GROUP_NAME", "PLAYER_ID", "PLAYER_NAME", "TEAM_ID", "TEAM_NAME", "PERIOD", "MINUTES_REMAINING", "SECONDS_REMAINING", "EVENT_TYPE", "ACTION_TYPE", "SHOT_TYPE", "SHOT_ZONE_BASIC", "SHOT_ZONE_AREA", "SHOT_ZONE_RANGE", "SHOT_DISTANCE", "LOC_X", "LOC_Y", "SHOT_ATTEMPTED_FLAG", "SHOT_MADE_FLAG", "GAME_DATE", "HTM", "VTM")

which you can get by doing:

shoturl <- "http://stats.nba.com/stats/shotchartlineupdetail/?leagueId=00&season=2016-17&seasonType=Regular+Season&teamId=0&outcome=&location=&month=0&seasonSegment=&dateFrom=&dateTo=&opponentTeamId=0&vsConference=&vsDivision=&gameSegment=&period=0&lastNGames=0&gameId=&group_id=0&contextFilter=&contextMeasure=FGA"


Season <- rjson::fromJSON(file = shoturl, method="C")
Names <- Season$resultSets[[1]][[2]]

So Names would be the columns of the dataframe, the problem is that by not using the CFID you get that the list where the data for those columns should be are empty, the answer that @be_green gives are the league average, and I need the team specific data

Derek Corcoran
  • 3,930
  • 2
  • 25
  • 54
  • Could you give an example of the output you expect? – be_green Dec 13 '17 at 15:52
  • Hi @be_green per your request I added the expected results, they are very similar to the ones shown in the example that worked but it also has the players that are in the court as a variable – Derek Corcoran Dec 13 '17 at 16:23
  • It looks like your API request returns a null rowset for those variables--that might be the issue? – be_green Dec 13 '17 at 16:38
  • Hi @be_green that is the issue, if you try the first example I give and you take out the `?CFID=33&CFPARAMS=2016-17&` part it gives out an empty dataframe as well. so it seems that the **CFID** is what we need to figure out in order to get the data, unfortunately, **CFID** is not documented. And I am not sure how to get that parameter – Derek Corcoran Dec 13 '17 at 16:56
  • Oh sorry, I misunderstood the problem completely! My bad. – be_green Dec 13 '17 at 17:09
  • No problem @be_green, I hope you don't get discouraged by it a keep on trying :D. Get those 50 exp points!!! – Derek Corcoran Dec 13 '17 at 17:15
  • This might be part of the problem--looks like the underlying endpoints changed: https://github.com/seemethere/nba_py/issues/67 – be_green Dec 13 '17 at 17:20
  • Is there a web address on the stats.nba.com site that shows what you want? You can get the query from loading that page. – Eumenedies Dec 19 '17 at 10:45
  • @Eumenedies I have not found it yet – Derek Corcoran Dec 19 '17 at 12:55
  • 2
    @DerekCorcoran As near as I can tell, the API isn't really an API at all. It's clunky and seems to be designed only to serve the tables that appear on the website. It's entirely possible that the shotchartlineupdetail endpoint is deprecated if there's no page that uses it. – Eumenedies Dec 19 '17 at 12:57
  • @DerekCorcoran to be clear, what difference do you expect between the shotchartdetail and shotchartlineupdetail? Is it just the group column? Can you use the shotchartdetail table and get the group columns from somewhere else? – Eumenedies Dec 19 '17 at 12:59
  • @Eumenedies, it has two extra columns which identify the identity of the players present at the moment the shot was taken – Derek Corcoran Dec 19 '17 at 14:47
  • @Eumenedies the two columns that I am missing are `"GROUP_ID", "GROUP_NAME"` – Derek Corcoran Dec 19 '17 at 16:17
  • @DerekCorcoran endpoint /stats/teamdashlineups has "GROUP_ID" and "GROUP_NAME" – svujic Dec 20 '17 at 01:35
  • Hi Derek - you might look at http://www.gregreda.com/2015/02/15/web-scraping-finding-the-api/ it is an example of figuring out API parameters. – James Thomas Durant Dec 20 '17 at 02:39
  • @JamesThomasDurant That's the sort of technique I was thinking of but, if you can't find a page that uses the endpoint you are looking for then you can't monitor the request. – Eumenedies Dec 20 '17 at 10:50
  • @Eumenedies - I looked as well and could not find the page as well. I did see this: http://rstudio-pubs-static.s3.amazonaws.com/11288_111663babc4f44359a35b1f5f1a22b89.html which seems to be another way to get summarized data. I also played with this: https://nycdatascience.com/blog/student-works/nba-lineup-data/ which seems to download the player and lineup data and combine them. It might be another approach - although the format of the data has changed slightly so modifications would be needed. – James Thomas Durant Dec 20 '17 at 14:17
  • @Eumenedies I will cehck the last think you said, I am trying to figure this out – Derek Corcoran Dec 20 '17 at 14:45
  • I tried emailing the NBA too see if they would provide an explanation - I guess the mere name of "Durant" would provoke a response. – James Thomas Durant Jan 05 '18 at 20:22
  • 1
    Hahahaha, I am @JamesThomasDurant brother of a former MVP, hahahhahahah, let me know if you get a response – Derek Corcoran Jan 06 '18 at 13:42
  • @JamesThomasDurant please let me know if they answer – Derek Corcoran Jan 06 '18 at 13:47
  • 1
    No responses... They can probably tell from my shooting statistics that I am not even close to related to an MVP. – James Thomas Durant Jan 06 '18 at 18:56

1 Answers1

1

So I believe the issue here is that you need to pass a PlayerID and TeamID to the API. Using PlayerID = 2544 and TeamID = 1610612739 below as an example seems to work:

library(tidyverse)
res <- jsonlite::read_json("https://stats.nba.com/stats/shotchartdetail?AheadBehind=&ClutchTime=&ContextFilter=&ContextMeasure=PTS&DateFrom=&DateTo=&EndPeriod=&EndRange=&GameID=&GameSegment=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&Period=0&PlayerID=2544&PlayerPosition=&PointDiff=&Position=&RangeType=&RookieYear=&Season=&SeasonSegment=&SeasonType=Regular+Season&StartPeriod=&StartRange=&TeamID=1610612739&VsConference=&VsDivision=")
# res %>% str(max.level = 3)

header_names <- flatten_chr(res$resultSets[[1]]$headers)
header_names
#>  [1] "GRID_TYPE"           "GAME_ID"             "GAME_EVENT_ID"      
#>  [4] "PLAYER_ID"           "PLAYER_NAME"         "TEAM_ID"            
#>  [7] "TEAM_NAME"           "PERIOD"              "MINUTES_REMAINING"  
#> [10] "SECONDS_REMAINING"   "EVENT_TYPE"          "ACTION_TYPE"        
#> [13] "SHOT_TYPE"           "SHOT_ZONE_BASIC"     "SHOT_ZONE_AREA"     
#> [16] "SHOT_ZONE_RANGE"     "SHOT_DISTANCE"       "LOC_X"              
#> [19] "LOC_Y"               "SHOT_ATTEMPTED_FLAG" "SHOT_MADE_FLAG"     
#> [22] "GAME_DATE"           "HTM"                 "VTM"

res$resultSets[[1]]$rowSet %>%
  map(`[`, 1:24) %>%
  map(~ set_names(., header_names)) %>%
  bind_rows()
#> # A tibble: 8,369 x 24
#>    GRID_TYPE GAME_ID GAME_EVENT_ID PLAYER_ID PLAYER_NAME TEAM_ID TEAM_NAME
#>    <chr>     <chr>           <int>     <int> <chr>         <int> <chr>    
#>  1 Shot Cha~ 002030~            20      2544 LeBron Jam~  1.61e9 Clevelan~
#>  2 Shot Cha~ 002030~            28      2544 LeBron Jam~  1.61e9 Clevelan~
#>  3 Shot Cha~ 002030~            35      2544 LeBron Jam~  1.61e9 Clevelan~
#>  4 Shot Cha~ 002030~            54      2544 LeBron Jam~  1.61e9 Clevelan~
#>  5 Shot Cha~ 002030~            67      2544 LeBron Jam~  1.61e9 Clevelan~
#>  6 Shot Cha~ 002030~            76      2544 LeBron Jam~  1.61e9 Clevelan~
#>  7 Shot Cha~ 002030~           224      2544 LeBron Jam~  1.61e9 Clevelan~
#>  8 Shot Cha~ 002030~           233      2544 LeBron Jam~  1.61e9 Clevelan~
#>  9 Shot Cha~ 002030~           235      2544 LeBron Jam~  1.61e9 Clevelan~
#> 10 Shot Cha~ 002030~           322      2544 LeBron Jam~  1.61e9 Clevelan~
#> # ... with 8,359 more rows, and 17 more variables: PERIOD <int>,
#> #   MINUTES_REMAINING <int>, SECONDS_REMAINING <int>, EVENT_TYPE <chr>,
#> #   ACTION_TYPE <chr>, SHOT_TYPE <chr>, SHOT_ZONE_BASIC <chr>,
#> #   SHOT_ZONE_AREA <chr>, SHOT_ZONE_RANGE <chr>, SHOT_DISTANCE <int>,
#> #   LOC_X <int>, LOC_Y <int>, SHOT_ATTEMPTED_FLAG <int>,
#> #   SHOT_MADE_FLAG <int>, GAME_DATE <chr>, HTM <chr>, VTM <chr>

Created on 2019-03-26 by the reprex package (v0.2.1)

JasonAizkalns
  • 20,243
  • 8
  • 57
  • 116