1

I want to scrape the NBA advanced stats. To start I just want to be able to scrape the names of the teams and I am having an issue where it is not collecting any of the information. I may be looking for the wrong thing in the find_all function. Any help is appreciated!

import requests
from bs4 import BeautifulSoup

url = "https://stats.nba.com/teams/elbow-touch/?sort=ELBOW_TOUCHES&dir=-1"
result = requests.get(url)
c = result.content

soup = Beaut ifulSoup(c,"html.parser")

title = soup.title.text
print(title)

teams = soup.find_all('td',{'class':'team'})

for element in teams:
    print(element.text)

Site that I want to scrape:

Site that I want to scrape

kenlukas
  • 3,616
  • 9
  • 25
  • 36

3 Answers3

3

The site is dynamic, thus, you need to use selenium:

from selenium import webdriver
from bs4 import BeautifulSoup as soup 
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://stats.nba.com/teams/elbow-touch/?sort=ELBOW_TOUCHES&dir=-1')
s = soup(d.page_source, 'html.parser').find('table', {'class':'table'})
headers, [_, *data] = [i.text for i in s.find_all('th')], [[i.text for i in b.find_all('td')] for b in s.find_all('tr')]
final_data = [i for i in data if len(i) > 1]

Now, final_data stores the all team results:

[['Houston Rockets', '63', '38', '25', '242.0', '367.0', '8.8', '2.4', '3.8', '64.2', '0.4', '0.7', '62.8', '5.5', '-', '3.7', '-', '0.5', '14.0', '0.5', '5.4', '0.3', '-'], ['Milwaukee Bucks', '63', '48', '15', '241.2', '409.5', '9.5', '2.3', '3.6', '62.4', '0.7', '1.0', '73.3', '5.4', '-', '4.3', '-', '0.6', '13.0', '0.5', '5.2', '0.4', '-'], ['New York Knicks', '62', '13', '49', '241.6', '420.4', '9.5', '2.0', '3.4', '56.8', '0.7', '1.0', '69.8', '4.8', '-', '4.7', '-', '0.6', '13.7', '0.5', '5.3', '0.5', '-'], ['Charlotte Hornets', '63', '29', '34', '242.0', '409.7', '9.6', '1.7', '3.5', '50.0', '1.1', '1.5', '71.9', '4.7', '-', '4.6', '-', '0.7', '14.2', '0.4', '4.5', '0.7', '-'], ['Detroit Pistons', '62', '31', '31', '242.8', '437.0', '10.0', '1.6', '3.2', '51.3', '0.9', '1.2', '75.3', '4.4', '-', '5.0', '-', '0.9', '17.6', '0.7', '6.8', '0.6', '-'], ['Washington Wizards', '62', '25', '37', '243.2', '420.2', '10.5', '2.5', '4.3', '58.4', '0.9', '1.2', '76.4', '6.1', '-', '4.6', '-', '0.7', '15.5', '0.6', '5.6', '0.5', '-'], ['Atlanta Hawks', '64', '22', '42', '242.3', '434.9', '11.0', '2.2', '3.7', '58.6', '1.2', '1.5', '77.3', '5.7', '-', '5.3', '-', '0.7', '12.9', '0.7', '6.5', '0.7', '-'], ['Brooklyn Nets', '65', '32', '33', '243.8', '440.3', '11.2', '2.5', '4.4', '58.3', '1.2', '1.7', '70.8', '6.4', '-', '4.6', '-', '0.7', '14.9', '0.9', '7.9', '0.8', '-'], ['San Antonio Spurs', '64', '35', '29', '241.6', '402.3', '11.3', '2.3', '4.1', '55.5', '0.8', '1.0', '85.7', '5.6', '-', '5.8', '-', '1.1', '18.7', '0.5', '4.8', '0.4', '-'], ['Boston Celtics', '64', '38', '26', '241.6', '420.8', '11.5', '2.5', '4.2', '58.4', '0.5', '0.7', '71.7', '5.5', '-', '5.7', '-', '0.9', '15.0', '0.6', '5.6', '0.3', '-'], ['Toronto Raptors', '64', '46', '18', '242.3', '418.0', '11.5', '3.5', '5.9', '59.6', '1.2', '1.5', '78.1', '8.3', '-', '4.1', '-', '0.7', '16.3', '0.4', '3.7', '0.7', '-'], ['Portland Trail Blazers', '63', '39', '24', '241.6', '409.8', '11.8', '2.4', '4.6', '51.9', '1.2', '1.5', '80.2', '6.1', '-', '5.5', '-', '1.0', '18.8', '0.7', '5.7', '0.7', '-'], ['Utah Jazz', '61', '36', '25', '240.8', '435.9', '11.9', '2.0', '3.8', '51.1', '1.4', '2.2', '66.7', '5.4', '-', '5.9', '-', '1.0', '17.1', '0.7', '5.9', '1.0', '-'], ['Minnesota Timberwolves', '63', '29', '34', '241.6', '412.4', '12.0', '2.9', '5.0', '57.3', '1.3', '1.6', '79.8', '7.3', '-', '5.2', '-', '1.0', '19.5', '0.6', '5.2', '0.7', '-'], ['Chicago Bulls', '63', '18', '45', '243.2', '411.3', '12.4', '2.8', '4.8', '57.9', '0.7', '0.9', '77.6', '6.4', '-', '6.3', '-', '0.8', '12.4', '0.6', '4.5', '0.4', '-'], ['LA Clippers', '65', '36', '29', '241.9', '430.4', '12.4', '2.9', '5.1', '56.9', '1.0', '1.5', '69.5', '7.0', '-', '5.4', '-', '0.9', '15.9', '0.7', '5.5', '0.6', '-'], ['Miami Heat', '62', '28', '34', '240.4', '426.1', '12.6', '2.0', '4.0', '50.2', '0.7', '1.3', '56.8', '4.9', '-', '7.0', '-', '1.1', '15.4', '0.4', '3.4', '0.5', '-'], ['New Orleans Pelicans', '65', '29', '36', '240.0', '435.0', '12.6', '3.5', '6.4', '54.8', '1.2', '1.6', '74.5', '8.4', '-', '4.4', '-', '0.9', '20.4', '0.7', '5.2', '0.8', '-'], ['Phoenix Suns', '64', '13', '51', '242.3', '435.8', '12.9', '2.8', '5.0', '56.7', '1.0', '1.3', '73.5', '6.8', '-', '6.2', '-', '0.8', '13.7', '0.6', '4.7', '0.6', '-'], ['Oklahoma City Thunder', '63', '39', '24', '242.0', '364.8', '13.6', '3.2', '5.8', '54.5', '1.0', '1.4', '65.9', '7.5', '-', '5.8', '-', '0.9', '14.7', '0.7', '4.8', '0.6', '-'], ['Dallas Mavericks', '62', '27', '35', '240.8', '435.4', '13.9', '1.8', '3.1', '55.9', '1.2', '1.6', '76.5', '5.0', '-', '8.6', '-', '1.1', '13.1', '0.8', '5.7', '0.7', '-'], ['Golden State Warriors', '63', '44', '19', '241.6', '442.3', '13.9', '2.8', '4.8', '57.0', '1.2', '1.5', '81.7', '6.9', '-', '7.2', '-', '1.6', '21.7', '0.8', '5.8', '0.7', '-'], ['Orlando Magic', '63', '28', '35', '241.2', '405.0', '14.0', '3.2', '5.7', '55.8', '1.1', '1.4', '80.9', '7.7', '-', '6.5', '-', '1.4', '21.8', '0.6', '4.0', '0.7', '-'], ['Los Angeles Lakers', '63', '30', '33', '241.6', '405.9', '14.2', '3.3', '5.7', '57.8', '1.1', '1.6', '67.0', '7.8', '-', '6.3', '-', '1.3', '20.7', '0.9', '6.3', '0.7', '-'], ['Denver Nuggets', '62', '42', '20', '240.8', '435.2', '15.0', '3.1', '5.3', '59.1', '1.1', '1.5', '72.5', '7.5', '-', '7.4', '-', '1.7', '22.3', '1.0', '6.4', '0.7', '-'], ['Indiana Pacers', '64', '41', '23', '240.4', '431.7', '15.3', '4.4', '7.2', '60.6', '1.4', '1.9', '74.2', '10.4', '-', '5.8', '-', '1.2', '20.9', '0.9', '6.0', '0.9', '-'], ['Cleveland Cavaliers', '64', '16', '48', '241.2', '407.3', '16.1', '2.3', '4.5', '51.6', '0.9', '1.1', '80.0', '5.6', '-', '10.0', '-', '1.2', '12.3', '0.5', '3.4', '0.4', '-'], ['Philadelphia 76ers', '63', '40', '23', '242.0', '446.9', '16.6', '2.5', '4.7', '52.7', '1.4', '1.7', '82.6', '6.6', '-', '9.6', '-', '1.8', '18.6', '0.7', '4.3', '0.7', '-'], ['Sacramento Kings', '62', '31', '31', '240.8', '425.2', '16.7', '3.2', '6.3', '50.3', '1.1', '1.6', '65.3', '7.5', '-', '8.0', '-', '1.5', '18.3', '1.0', '6.2', '0.7', '-'], ['Memphis Grizzlies', '65', '25', '40', '241.9', '452.1', '20.5', '3.4', '6.7', '51.3', '1.5', '1.9', '81.1', '8.6', '-', '11.2', '-', '1.6', '14.1', '0.8', '4.1', '0.8', '-']]

To get just the teams:

teams = [a for a, *_ in final_data]

Output:

['Houston Rockets', 'Milwaukee Bucks', 'New York Knicks', 'Charlotte Hornets', 'Detroit Pistons', 'Washington Wizards', 'Atlanta Hawks', 'Brooklyn Nets', 'San Antonio Spurs', 'Boston Celtics', 'Toronto Raptors', 'Portland Trail Blazers', 'Utah Jazz', 'Minnesota Timberwolves', 'Chicago Bulls', 'LA Clippers', 'Miami Heat', 'New Orleans Pelicans', 'Phoenix Suns', 'Oklahoma City Thunder', 'Dallas Mavericks', 'Golden State Warriors', 'Orlando Magic', 'Los Angeles Lakers', 'Denver Nuggets', 'Indiana Pacers', 'Cleveland Cavaliers', 'Philadelphia 76ers', 'Sacramento Kings', 'Memphis Grizzlies']

To get specific statistics, it is easiest to create a list of dictionaries by binding the header values to the data lists:

data_attrs = [dict(zip(headers, i)) for i in final_data]
all_touches = [i['Touches'] for i in data_attrs]
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • Thank you for the help. Is there anywhere you could direct me to understand what you did in these lines: headers, [_, *data] = [i.text for i in s.find_all('th')], [[i.text for i in b.find_all('td')] for b in s.find_all('tr')] final_data = [i for i in data if len(i) > 1] *Not sure how to format this is an a comment – Brennan Mosher Mar 05 '19 at 03:22
  • Also, how would I access specific stats using this code? – Brennan Mosher Mar 05 '19 at 03:25
  • @BrennanMosher Those lines consist of list comprehensions, forming the table data from the `bs4` object. `[[i.text for i in b.find_all('td')] for b in s.find_all('tr')]` is a nested comprehension, and is creating the table data as a list of lists, while `[i for i in data if len(i) > 1]` is removing all extraneous elements found during the parsing by checking the lengths of the sublists. For more on list comprehensions, see [here](https://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/). To access specific stats, please see my recent edit. – Ajax1234 Mar 05 '19 at 03:29
  • @Ajax1234 - if you have the time and energy, can you please explain what ` [_, *data] ` does? – Jack Fleeting Mar 05 '19 at 11:57
  • 1
    @JackFleeting No problem. `[_, *data]` is called unpacking, because its assigned result, `[[i.text for i in b.find_all('td')] for b in s.find_all('tr')]`, has an empty list as the first element (the result of iteration over the headers), and utilizing `_`, a [throwaway variable](https://stackoverflow.com/questions/36315309/how-does-python-throw-away-variable-work), is a much cleaner way of removing it and storing the remaining result in a list. See more on unpacking [here](https://medium.com/understand-the-python/understanding-the-asterisk-of-python-8b9daaa4a558) – Ajax1234 Mar 05 '19 at 15:06
3

Another way to do this is to send a get request to the site API and receive a json response. By alter the params you can get different results.

You can look for where the request was sent to by your browser under the chrome developer tool.

import requests

url = "https://stats.nba.com/stats/leaguedashptstats?"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}

params = {
    "PerMode": "PerGame",
    "PlayerOrTeam": "Team",
    "PtMeasureType": "ElbowTouch",
    "Season": "2018-19",
    "SeasonType": "Regular Season",
    "StarterBench": "",
    "PlayerPosition": "",
    "PlayerExperience": "",
    "GameScope": "",
    "VsConference": "",
    "VsDivision": "",
    "DateFrom": "",
    "DateTo": "",
    "SeasonSegment": "",
    "Location": "",
    "Outcome": "",
    "LastNGames": "0",
    "Month": "0",
    "OpponentTeamID": "0"
}

r = requests.get(url, params=params, headers=headers)
data = r.json()
results = data['resultSets'][0]['rowSet']

for result in results:
    print(result)
sammyyy
  • 343
  • 3
  • 9
0

A variation on @Ajax1234's answer can get you the whole table loaded into a dataframe:

import pandas as pd

pd.read_html(str(s))

And there's your table.

Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45