I'm trying to extract the Most Common Batting Orders from http://www.baseball-reference.com/teams/SFG/2017-batting-orders.shtml
import bs4
import urllib.request as urllib
url = 'http://www.baseball-reference.com/teams/SFG/2017-batting-orders.shtml'
html = urllib.urlopen(url).read()
batting_order_soup = bs4.BeautifulSoup(html, "html.parser")
table = batting_order_soup.find("table", attrs={"class":"stats_table nav_table"})
>>> print(table)
None
I would expect to see a table with 6 Games, 4 Games, 4 Games, 3 Games 2 Games. Under the 6 Games column Span, Nunez, Belt, etc.
In the browser, I see both the 6 Games in the comments and also in html e.g.
<table class="stats_table nav_table" id="st_0"><tbody><tr class="rowSum">
<td valign="top"><strong>6 Games</strong><p></p><li value="1">
<a data-entry-id="spande01" href="/players/s/spande01.shtml"
title="Denard Span">Span</a> </li>
<li value="2"><a data-entry-id="nunezed02" href="/players/n/nunezed02.shtml"
title="Eduardo Nunez">Nunez</a></li>
Is there a way within beautifulsoup to be able to extract the table? I do see in the batting_order_soup (i.e. print(batting_order_soup) that contains no-js, so perhaps as noted in the comments below that the javascript isn't run. I presume we can't get bs4 to run js? Can someone provide an example how to extract the table embedded in the comments?
The code below can be run interactively. So if you were to say run
table = batting_order_soup.find("table")
print(table)
You will get the first table data which is Batting Order.
Thank you, -Raj