the Accessing commented HTML Lines with BeautifulSoup

Question

I am attempting to webscrape stats from this specific webpage: https://www.sports-reference.com/cfb/schools/louisville/2016/gamelog/

However, the table for the 'Defensive Game Log' appears to be commented out when I look at the HTML source (starts with <...!-- and ends with -->)

Because of this, when attempting to use BeautifulSoup4 the following code only grabs the offensive data that is not commented out while the defensive data is commented out.

from urllib.request import Request,urlopen
from bs4 import BeautifulSoup
import re

accessurl = 'https://www.sports-reference.com/cfb/schools/oklahoma-state/2016/gamelog/'
req = Request(accessurl)
link = urlopen(req)
soup = BeautifulSoup(link.read(), "lxml")


tables = soup.find_all(['th', 'tr'])
my_table = tables[0]
rows = my_table.findChildren(['tr'])
for row in rows:
    cells = row.findChildren('td')
    for cell in cells:
        value = cell.string
        print(value)

I am curious if there are any solutions to be able to add all of the defensive values into a list the same way the offensive data is stored be it inside or outside of BeautifulSoup4. Thanks!

Note that I added onto solution given below derived from here:

data = []

table = defensive_log
table_body = table.find('tbody')

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele]) # Get rid of empty values

What do you mean by "commented out"? – snapcrack Jul 16 '17 at 01:10 — snapcrack, Jul 16 '17 at 01:10

score 3 · Accepted Answer · answered Jul 16 '17 at 08:50

3

Comment object will give you what you want:

from urllib.request import Request,urlopen
from bs4 import BeautifulSoup, Comment

accessurl = 'https://www.sports-reference.com/cfb/schools/oklahoma-state/2016/gamelog/'
req = Request(accessurl)
link = urlopen(req)
soup = BeautifulSoup(link, "lxml")

comments=soup.find_all(string=lambda text:isinstance(text,Comment))
for comment in comments:
    comment=BeautifulSoup(str(comment), 'lxml')
    defensive_log = comment.find('table') #search as ordinary tag
    if defensive_log:
        break

answered Jul 16 '17 at 08:50

Dmitriy Fialkovskiy

3,065
8
32
47

@Storm, any feedback? did my solution help? – Dmitriy Fialkovskiy Jul 17 '17 at 13:56
Sorry for the long time required to get back to you--I've been moving and finally back on the project. I am running through it now to attempt to incorporate it. – Storm Aug 17 '17 at 19:52
I added the following code from [here](https://stackoverflow.com/questions/23377533/python-beautifulsoup-parsing-table). It allowed me to put this into a table. I am putting the final code string in the question above. – Storm Aug 17 '17 at 20:10
Sorry i don't completely understand you. So you mean that you've made a workaround but now trying to achieve the goal my way? – Dmitriy Fialkovskiy Aug 18 '17 at 17:43

the Accessing commented HTML Lines with BeautifulSoup

1 Answers1

Linked