0

I'm trying to parse a table from the nfl standings website. I'm looking at the first table, "American Football Conference". When I visit the page in a browser and "Inspect element" I can see <table>, <tr>, and <td> tags, but I can't find the table using BeautifulSoup.

from bs4 import BeautifulSoup                                                                                                                                                                                      
import requests                                                                                                                                                                                                    

html_data = requests.get('https://www.nfl.com/standings').text                                                                                                                                                       

soup = BeautifulSoup(html_data, 'lxml')                                                                                                                                                                              

print(soup.find('table') is None)  # prints True                                                                                                                                                                                

print(soup.find('tr') is None)  # prints True                                                                                                                                                                                   

print(soup.find('td') is None)  # prints True
Parabolord
  • 302
  • 3
  • 13
  • So when you `print(html_data)` what do you get? Is there a `` tag in it? (Hint: no)
    – kindall Dec 30 '17 at 02:31
  • Can you help me understand why I can see the table when using the "Inspect Element" development tool feature in most browsers, but I can't see the table when viewing the source? – Parabolord Dec 30 '17 at 02:54
  • 2
    I noticed the same. It is mere speculation, but I believe it may be the site's defense against web scraping itself. In the source, it appears that there is a json style listing of the data, although without accessing the API it would be near impossible to scrape. You could use https://github.com/BurntSushi/nflgame/tree/master/nflgame, which interacts with the NFL API, providing data results – Ajax1234 Dec 30 '17 at 03:01
  • 3
    page uses JavaScript to add elements but BeautifulSoup can't run JavaScipt so you can't find this element. You can use `Selenium` to control browser which will load page and run JavaScript. Or you can check in DevTool (Network->XHR) if JavaScript reads this data from different url, and then you can try to read from this url too. JavaScript mostly gets data as JSON which you can easily convert it to python dictionary using module `json`. See https://feeds.nfl.com/feeds-rs/scores.json – furas Dec 30 '17 at 05:23

0 Answers0