I’m trying to retrieve data from the “Advanced Box Score Stats" from the following webpage: http://www.sports-reference.com/cbb/boxscores/2016-11-11-villanova.html
I tried using BeautifulSoup in a very broad way to retrieve all the tables:
import requests
from bs4 import BeautifulSoup
base_url = "http://www.sports-reference.com/cbb/boxscores/2016-11-11-villanova.html"
r = requests.get(base_url)
soup = BeautifulSoup(r.text, "html.parser")
tables = soup.find_all("table")
for table in tables:
print table.get_text()
In doing so, it only retrieved the “Basic Box Score Stats”. However, it didn’t retrieve the “Advanced Box Score Stats” like I had hoped.
Next, I tried getting more specific by using the lxml path:
import requests
from lxml import html
page = requests.get('http://www.sports-reference.com/cbb/boxscores/2016-11-11-villanova.html')
tree = html.fromstring(page.content)
boxscore_Advanced = tree.xpath('//*[@id="box-score-advanced-lafayette"]/tbody/tr[1]/td[1]/text()’)
print boxscore_Advanced
In doing so, it returned an empty list.
I've been struggling with this for a good amount of time, and have tried to solve this problem by using the following posts:
- Why does this xpath fail using lxml in python?
- Python xpath query not returning text value
- lxml xpath unable to display html items
- http://web.stanford.edu/~zlotnick/TextAsData/Web_Scraping_with_Beautiful_Soup.html
- Unable to fetch Table from BeautifulSoup
Thank you in advance for any and all help!