I'm trying to scrape some data from baseball-reference.com. I've written some code to get the data from other parts of the site, where the tables are coded a little more simply, but the particular set of pages is apparently more complicated. Here's the code I have so far.
from urllib.request import urlopen from bs4 import BeautifulSoup
# Declare URL
test_url = 'https://www.baseball-reference.com/boxes/SLN/SLN201704020.shtml'
# Query the website and return the HTML
page = urlopen(test_url)
# Parse the HTML and store
soup = BeautifulSoup(page, 'lxml')
table = soup.find("div", {"class": "table_outer_container"})
This doesn't find the tables that I want though (on this particular page, the two tables with At-Bats, RBIs, HRs, runs, etc.). I've tried a few other things, e.g.
table = soup.find_all("table" , {"class": "sortable stats_table"})
but it doesn't work either. I've also tried to read the site using pandas, with no luck, so if there's an easier way with pandas, I'm open to that too.