Web scraping splits data from each player on basketball reference for 2021-22 season

Question

I am currently trying to develop an unbiased rating system for NBA players over the course of the season in R, and one very important piece of information I am missing is the "splits" section for each player, where I can see how many wins his team has been involved in. For example, Darius Garland played in 68 games last season, winning 37 of them.

What I need is a csv file with 2 columns where I have the number of wins and the "code" of the player (for example, Garland's code is garlada01). I need to join it with the other table I already have in the csv file and join these 2 data frames by the same key in R, and this "code" is the perfect solution for that.

Do you have any idea or guidance on how to do this? I have never done web scraping before and my Python knowledge is not that good yet.

share the url and csv file – chitown88 Aug 18 '22 at 07:33 — chitown88, Aug 18 '22 at 07:33

TheOneTrueFives · Answer 1 · 2022-08-17T14:40:00.167

This would best be done using BeautifulSoup, and would look something like this.

import requests
from bs4 import BeautifulSoup

url = '' #Use whatever URL you're scraping from
r = requests.get(url)
if(r.status_code != 200):
     print("Could not connect to webpage")
     quit()
soup = BeautifulSoup(r.content, 'html.parser')

Now that you have the BeautifulSoup object, you can parse the html that you got from the webpage and look for specific tags that contain the data you're looking for (I can't say what those are, you would have to figure those out.

Some good references:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/

How to find elements by class

Web scraping splits data from each player on basketball reference for 2021-22 season

1 Answers1