0

I am trying to scrape the data from NBA stats, specifically the team's boxscore. I am looking for the nba_api endpoint for this page so that i can scrape the data.

How can I find the endpoint?

Takomochi
  • 13
  • 1
  • 6

2 Answers2

3

You find the endpoint by opening Dev Tools (sfht-ctrl-i) and look under Network -> XHR (you may need to refresh the page). Watch the panel for the requests to start popping up, and find the one that has your data. Go to Headers to find the info needed to make the request:

import requests
import pandas as pd


url = 'https://stats.nba.com/stats/leaguegamelog'
headers= {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
          'Referer': 'https://www.nba.com/'}
payload = {
    'Counter': '1000',
    'DateFrom': '',
    'DateTo': '',
    'Direction': 'DESC',
    'LeagueID': '00',
    'PlayerOrTeam': 'T',
    'Season': '2021-22',
    'SeasonType': 'Regular Season',
    'Sorter': 'DATE'}

jsonData = requests.get(url, headers=headers, params=payload).json()


rows = jsonData['resultSets'][0]['rowSet']
columns = jsonData['resultSets'][0]['headers']

df = pd.DataFrame(rows, columns=columns)

Output:

print(df)
     SEASON_ID     TEAM_ID TEAM_ABBREVIATION  ...  PTS PLUS_MINUS VIDEO_AVAILABLE
0        22021  1610612759               SAS  ...  110          2               1
1        22021  1610612744               GSW  ...  108         -2               1
2        22021  1610612761               TOR  ...   93          5               1
3        22021  1610612755               PHI  ...   88         -5               1
4        22021  1610612738               BOS  ...  124         20               1
       ...         ...               ...  ...  ...        ...             ...
2133     22021  1610612754               IND  ...  122         -1               1
2134     22021  1610612749               MIL  ...  127         23               1
2135     22021  1610612751               BKN  ...  104        -23               1
2136     22021  1610612744               GSW  ...  121          7               1
2137     22021  1610612747               LAL  ...  114         -7               1

[2138 rows x 29 columns]
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • When I try to view the following URl, the browser spins and spins, but it never displays any data. https://stats.nba.com/stats/leaguegamelog I tried it in Mozilla and Chrome. Neither worked. – ASH Jun 13 '22 at 22:11
  • @ASH, correct. You will not got a response if you try to access that url data directly through the browser. If you do it programatically, with Python in this case, you do need to include the `'Referer'` in the headers (otherwise again, you don't get a response). My guess is since the Referer dictates the site from where the request is made from, since you are making the request directly in the address bar, it hangs since it's not from the nba.com site. You can read about Referer [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Referer) – chitown88 Jun 14 '22 at 11:49
  • I think it makes sense. I still don't know how you got from here: https://www.nba.com/stats/teams/boxscores/?Season=2021-22&SeasonType=Regular%20Season to, here: https://stats.nba.com/stats/leaguegamelog I know Python, at least sort of, but I almost never work with json. Maybe that's what's throwing me off. Just curious as to how you determined to use the URL that you posted, rather than the one that the OP posted. – ASH Jun 14 '22 at 13:54
  • If you go to the plain site and open the Dev Tools (shft-ctrl-i) you can see the requests being made. You may need to refresh. But as you click around, you can find the urls to the request in that panel (Network -> XHR -> Headers) – chitown88 Jun 14 '22 at 13:56
  • And just as a not, if you are familiar with python, I'm assuming you've worked with dictionaries and lists. JSON is basically just that. – chitown88 Jun 14 '22 at 13:57
  • Ahhh! Now I see it! Thanks so much for sharing this knowledge!! I will definitely be using this sometime in the near future!!! – ASH Jun 14 '22 at 14:49
  • Hey there chitown88. How did you get the payload and headers? If I go to that link and hit Ctrl+Shift+I, i can click Network and HXR, and I see a bunch of rows listed under Domain and File. How do you know which one to click on to get the payload and headers? If I click on Domain and File, I see 'Response Headers' in a small window in the lower right hand corner of the browser window, but nothing matches what you posted in your answer. Your answer works, but I can't figure out how you got that, from the Domain and File and HXR, in the web browser. – ASH Feb 23 '23 at 14:17
  • its through XHR – chitown88 Feb 24 '23 at 19:40
0

I'm not a huge sports fan, but this seems like it: A free NBA Boxscore API You could attempt to isolate the CSS, JS and HTML segments from the site.

It is in the source of a div.

Mikus
  • 29
  • 7