1

I am trying to get a list of all NFL teams from a website and I am very close. I am able to get some data, but I can't drill down far enough to get what I want.

My code:

from bs4 import BeautifulSoup
import requests

f = open('C:\Users\Josh\Documents\Python\outFileRoto.txt', 'w')
errorFile = open('C:\Users\Josh\Documents\Python\errors.txt', 'w')



r  = requests.get('https://rotogrinders.com/team-stats/nfl-allowed?sport=nfl&position=QB&site=draftkings&range=season')
data = r.text
#soup = BeautifulSoup(urllib2.urlopen('http://games.espn.com/ffl/tools/projections?startIndex=' +str(x).read(), 'html')
soup = BeautifulSoup(data, 'html.parser')
leftTable = soup.find('div', attrs={'class' : 'rgt-bdy left'})
#f.write("LEFT TABLE\n" + str(leftTable) + '\n')

rightCol = leftTable.find('div', attrs={'class' : 'rgt-colwrap'})

for row in rightCol.findAll('div'):
    #col = row.findAll('div')
    #f.write("col" + str(col))

    try:
        name = str(row)
        f.write("----------------------------COLUMN---------------------------\n" + name + '\n')
    
    except Exception as e:
        errorFile.write (str(x) + ">>>>>>>>>>>>" + str(a) + "<<<<<<<<<<<<<<ROW" + str(row) + '\n')
        pass
        

f.close
errorFile.close

The problem is that I get this:

----------------------------COLUMN---------------------------
<div class="rgt-col">
<div class="rgt-hdr">Team<span class="icn-arw-down"></span></div>
</div>
----------------------------COLUMN---------------------------
<div class="rgt-hdr">Team<span class="icn-arw-down"></span></div>
----------------------------COLUMN---------------------------
<div class="rgt-col">
<div class="rgt-hdr">Abbr<span class="icn-arw-down"></span></div>
</div>
----------------------------COLUMN---------------------------
<div class="rgt-hdr">Abbr<span class="icn-arw-down"></span></div>

But I need this:

NFL Teams

Community
  • 1
  • 1
Jay
  • 324
  • 2
  • 14

3 Answers3

2

The data is in json format in the page source inside the $(document).ready(function() call which is what loads the data you see in your browser. You just need to find the correct script tag with bs4 and parse it using a regex then use json.loads the result to get a list of dicts:

In [1]: from bs4 import BeautifulSoup

In [2]: import requests

In [3]: import re

In [4]: import json

In [5]: soup = BeautifulSoup(requests.get("https://rotogrinders.com/team-stats/nfl-allowed?sport=nfl&position=QB&site=draftkings&range=season").content)

In [6]: script = soup.find("script", text=re.compile(r'data\s+=\s+')).text

In [7]: data = json.loads(re.search(r"data\s+=\s+(\[.*?\])", script).group(1))

In [8]: print(data)
[{u'fuml': 0, u'tyds': 11, u'tar': 0, u'gp': 2, u'int': 2, u'rztar': 0, u'retd': 0, u'pct': u'63.64%', u'tchs': 7, u'rutd': 0, u'payds': 371, u'rec': 0, u'ruyds': 11, u'patd': 2, u'reypc': u'0.00', u'rzatt': 1, u'fpts': u'24.94', u'ruypc': u'1.57', u'att': 55, u'ruatt': 7, u'team': u'Baltimore Ravens', u'reyds': 0, u'cmp': 35, u'abbr': u'BAL'}, {u'fuml': 0, u'tyds': 29, u'tar': 0, u'gp': 2, u'int': 3, u'rztar': 0, u'retd': 0, u'pct': u'52.78%', u'tchs': 5, u'rutd': 0, u'payds': 448, u'rec': 0, u'ruyds': 29, u'patd': 5, u'reypc': u'0.00', u'rzatt': 2, u'fpts': u'40.82', u'ruypc': u'5.80', u'att': 72, u'ruatt': 5, u'team': u'Cincinnati Bengals', u'reyds': 0, u'cmp': 38, u'abbr': u'CIN'}, {u'fuml': 0, u'tyds': 2, u'tar': 0, u'gp': 2, u'int': 2, u'rztar': 0, u'retd': 0, u'pct': u'57.32%', u'tchs': 3, u'rutd': 0, u'payds': 580, u'rec': 0, u'ruyds': 2, u'patd': 4, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'40.40', u'ruypc': u'0.67', u'att': 82, u'ruatt': 3, u'team': u'Cleveland Browns', u'reyds': 0, u'cmp': 47, u'abbr': u'CLE'}, {u'fuml': 0, u'tyds': 15, u'tar': 0, u'gp': 2, u'int': 2, u'rztar': 0, u'retd': 0, u'pct': u'62.89%', u'tchs': 3, u'rutd': 0, u'payds': 695, u'rec': 0, u'ruyds': 15, u'patd': 1, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'34.30', u'ruypc': u'5.00', u'att': 97, u'ruatt': 3, u'team': u'Pittsburgh Steelers', u'reyds': 0, u'cmp': 61, u'abbr': u'PIT'}, {u'fuml': 0, u'tyds': 24, u'tar': 0, u'gp': 2, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'62.32%', u'tchs': 10, u'rutd': 0, u'payds': 421, u'rec': 0, u'ruyds': 24, u'patd': 3, u'reypc': u'0.00', u'rzatt': 2, u'fpts': u'33.24', u'ruypc': u'2.40', u'att': 69, u'ruatt': 10, u'team': u'Chicago Bears', u'reyds': 0, u'cmp': 43, u'abbr': u'CHI'}, {u'fuml': 0, u'tyds': 31, u'tar': 0, u'gp': 2, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'70.00%', u'tchs': 6, u'rutd': 0, u'payds': 623, u'rec': 0, u'ruyds': 31, u'patd': 6, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'56.02', u'ruypc': u'5.17', u'att': 80, u'ruatt': 6, u'team': u'Detroit Lions', u'reyds': 0, u'cmp': 56, u'abbr': u'DET'}, {u'fuml': 0, u'tyds': -1, u'tar': 0, u'gp': 2, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'65.71%', u'tchs': 3, u'rutd': 0, u'payds': 606, u'rec': 0, u'ruyds': -1, u'patd': 3, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'38.14', u'ruypc': u'-0.33', u'att': 70, u'ruatt': 3, u'team': u'Green Bay Packers', u'reyds': 0, u'cmp': 46, u'abbr': u'GBP'}, {u'fuml': 2, u'tyds': 48, u'tar': 0, u'gp': 2, u'int': 2, u'rztar': 0, u'retd': 0, u'pct': u'58.44%', u'tchs': 7, u'rutd': 1, u'payds': 484, u'rec': 0, u'ruyds': 48, u'patd': 3, u'reypc': u'0.00', u'rzatt': 2, u'fpts': u'41.16', u'ruypc': u'6.86', u'att': 77, u'ruatt': 7, u'team': u'Minnesota Vikings', u'reyds': 0, u'cmp': 45, u'abbr': u'MIN'}, {u'fuml': 1, u'tyds': -3, u'tar': 0, u'gp': 1, u'int': 0, u'rztar': 0, u'retd': 0, u'pct': u'67.65%', u'tchs': 4, u'rutd': 0, u'payds': 258, u'rec': 0, u'ruyds': -3, u'patd': 1, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'13.02', u'ruypc': u'-0.75', u'att': 34, u'ruatt': 4, u'team': u'Buffalo Bills', u'reyds': 0, u'cmp': 23, u'abbr': u'BUF'}, {u'fuml': 1, u'tyds': 28, u'tar': 0, u'gp': 2, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'64.56%', u'tchs': 8, u'rutd': 0, u'payds': 584, u'rec': 0, u'ruyds': 28, u'patd': 4, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'43.16', u'ruypc': u'3.50', u'att': 79, u'ruatt': 8, u'team': u'Miami Dolphins', u'reyds': 0, u'cmp': 51, u'abbr': u'MIA'}, {u'fuml': 0, u'tyds': 36, u'tar': 0, u'gp': 2, u'int': 2, u'rztar': 0, u'retd': 0, u'pct': u'68.29%', u'tchs': 8, u'rutd': 0, u'payds': 660, u'rec': 0, u'ruyds': 36, u'patd': 4, u'reypc': u'0.00', u'rzatt': 1, u'fpts': u'47.00', u'ruypc': u'4.50', u'att': 82, u'ruatt': 8, u'team': u'New England Patriots', u'reyds': 0, u'cmp': 56, u'abbr': u'NEP'}, {u'fuml': 0, u'tyds': 7, u'tar': 0, u'gp': 1, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'76.67%', u'tchs': 3, u'rutd': 0, u'payds': 366, u'rec': 0, u'ruyds': 7, u'patd': 1, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'21.34', u'ruypc': u'2.33', u'att': 30, u'ruatt': 3, u'team': u'New York Jets', u'reyds': 0, u'cmp': 23, u'abbr': u'NYJ'}, {u'fuml': 2, u'tyds': 14, u'tar': 0, u'gp': 2, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'54.55%', u'tchs': 4, u'rutd': 0, u'payds': 402, u'rec': 0, u'ruyds': 14, u'patd': 1, u'reypc': u'0.00', u'rzatt': 1, u'fpts': u'21.48', u'ruypc': u'3.50', u'att': 66, u'ruatt': 4, u'team': u'Houston Texans', u'reyds': 0, u'cmp': 36, u'abbr': u'HOU'}, {u'fuml': 0, u'tyds': 12, u'tar': 0, u'gp': 2, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'73.61%', u'tchs': 3, u'rutd': 0, u'payds': 606, u'rec': 0, u'ruyds': 12, u'patd': 3, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'41.44', u'ruypc': u'4.00', u'att': 72, u'ruatt': 3, u'team': u'Indianapolis Colts', u'reyds': 0, u'cmp': 53, u'abbr': u'IND'}, {u'fuml': 1, u'tyds': 25, u'tar': 0, u'gp': 2, u'int': 0, u'rztar': 0, u'retd': 0, u'pct': u'62.71%', u'tchs': 7, u'rutd': 1, u'payds': 419, u'rec': 0, u'ruyds': 25, u'patd': 6, u'reypc': u'0.00', u'rzatt': 1, u'fpts': u'51.26', u'ruypc': u'3.57', u'att': 59, u'ruatt': 7, u'team': u'Jacksonville Jaguars', u'reyds': 0, u'cmp': 37, u'abbr': u'JAC'}, {u'fuml': 0, u'tyds': 39, u'tar': 0, u'gp': 2, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'54.79%', u'tchs': 4, u'rutd': 0, u'payds': 496, u'rec': 0, u'ruyds': 39, u'patd': 1, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'29.74', u'ruypc': u'9.75', u'att': 73, u'ruatt': 4, u'team': u'Tennessee Titans', u'reyds': 0, u'cmp': 40, u'abbr': u'TEN'}, {u'fuml': 0, u'tyds': 20, u'tar': 0, u'gp': 2, u'int': 2, u'rztar': 0, u'retd': 0, u'pct': u'63.51%', u'tchs': 2, u'rutd': 0, u'payds': 571, u'rec': 0, u'ruyds': 20, u'patd': 4, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'41.84', u'ruypc': u'10.00', u'att': 74, u'ruatt': 2, u'team': u'Dallas Cowboys', u'reyds': 0, u'cmp': 47, u'abbr': u'DAL'}, {u'fuml': 0, u'tyds': 12, u'tar': 0, u'gp': 2, u'int': 0, u'rztar': 0, u'retd': 0, u'pct': u'60.67%', u'tchs': 2, u'rutd': 0, u'payds': 490, u'rec': 0, u'ruyds': 12, u'patd': 1, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'27.80', u'ruypc': u'6.00', u'att': 89, u'ruatt': 2, u'team': u'New York Giants', u'reyds': 0, u'cmp': 54, u'abbr': u'NYG'}, {u'fuml': 1, u'tyds': 37, u'tar': 0, u'gp': 2, u'int': 2, u'rztar': 0, u'retd': 0, u'pct': u'60.00%', u'tchs': 5, u'rutd': 0, u'payds': 425, u'rec': 0, u'ruyds': 37, u'patd': 0, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'20.70', u'ruypc': u'7.40', u'att': 55, u'ruatt': 5, u'team': u'Philadelphia Eagles', u'reyds': 0, u'cmp': 33, u'abbr': u'PHI'}, {u'fuml': 0, u'tyds': 4, u'tar': 0, u'gp': 2, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'73.13%', u'tchs': 2, u'rutd': 1, u'payds': 592, u'rec': 0, u'ruyds': 4, u'patd': 3, u'reypc': u'0.00', u'rzatt': 1, u'fpts': u'44.08', u'ruypc': u'2.00', u'att': 67, u'ruatt': 2, u'team': u'Washington Redskins', u'reyds': 0, u'cmp': 49, u'abbr': u'WAS'}, {u'fuml': 0, u'tyds': 13, u'tar': 0, u'gp': 2, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'73.08%', u'tchs': 6, u'rutd': 0, u'payds': 580, u'rec': 0, u'ruyds': 13, u'patd': 7, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'54.50', u'ruypc': u'2.17', u'att': 78, u'ruatt': 6, u'team': u'Atlanta Falcons', u'reyds': 0, u'cmp': 57, u'abbr': u'ATL'}, {u'fuml': 0, u'tyds': 30, u'tar': 0, u'gp': 2, u'int': 4, u'rztar': 0, u'retd': 0, u'pct': u'56.45%', u'tchs': 8, u'rutd': 1, u'payds': 421, u'rec': 0, u'ruyds': 30, u'patd': 3, u'reypc': u'0.00', u'rzatt': 2, u'fpts': u'36.84', u'ruypc': u'3.75', u'att': 62, u'ruatt': 8, u'team': u'Carolina Panthers', u'reyds': 0, u'cmp': 35, u'abbr': u'CAR'}, {u'fuml': 1, u'tyds': 12, u'tar': 0, u'gp': 2, u'int': 0, u'rztar': 0, u'retd': 0, u'pct': u'70.89%', u'tchs': 6, u'rutd': 0, u'payds': 687, u'rec': 0, u'ruyds': 12, u'patd': 1, u'reypc': u'0.00', u'rzatt': 3, u'fpts': u'38.68', u'ruypc': u'2.00', u'att': 79, u'ruatt': 6, u'team': u'New Orleans Saints', u'reyds': 0, u'cmp': 56, u'abbr': u'NOS'}, {u'fuml': 0, u'tyds': 10, u'tar': 0, u'gp': 2, u'int': 0, u'rztar': 0, u'retd': 0, u'pct': u'62.16%', u'tchs': 3, u'rutd': 0, u'payds': 653, u'rec': 0, u'ruyds': 10, u'patd': 5, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'52.12', u'ruypc': u'3.33', u'att': 74, u'ruatt': 3, u'team': u'Tampa Bay Buccaneers', u'reyds': 0, u'cmp': 46, u'abbr': u'TBB'}, {u'fuml': 1, u'tyds': 76, u'tar': 0, u'gp': 2, u'int': 2, u'rztar': 0, u'retd': 0, u'pct': u'53.42%', u'tchs': 14, u'rutd': 1, u'payds': 391, u'rec': 0, u'ruyds': 76, u'patd': 2, u'reypc': u'0.00', u'rzatt': 1, u'fpts': u'37.24', u'ruypc': u'5.43', u'att': 73, u'ruatt': 14, u'team': u'Denver Broncos', u'reyds': 0, u'cmp': 39, u'abbr': u'DEN'}, {u'fuml': 0, u'tyds': 6, u'tar': 0, u'gp': 2, u'int': 2, u'rztar': 0, u'retd': 0, u'pct': u'63.77%', u'tchs': 4, u'rutd': 0, u'payds': 511, u'rec': 0, u'ruyds': 6, u'patd': 2, u'reypc': u'0.00', u'rzatt': 0, u'fpts': u'30.04', u'ruypc': u'1.50', u'att': 69, u'ruatt': 4, u'team': u'Kansas City Chiefs', u'reyds': 0, u'cmp': 44, u'abbr': u'KCC'}, {u'fuml': 1, u'tyds': 5, u'tar': 0, u'gp': 2, u'int': 1, u'rztar': 0, u'retd': 0, u'pct': u'71.05%', u'tchs': 2, u'rutd': 0, u'payds': 819, u'rec': 0, u'ruyds': 5, u'patd': 7, u'reypc': u'0.00', u'rzatt': 1, u'fpts': u'64.26', u'ruypc': u'2.50', u'att': 76, u'ruatt': 2, u'team': u'Oakland Raiders', u'reyds': 0, u'cmp': 54, u'abbr': u'OAK'}, {u'fuml': 1, u'tyds': 49, u'tar': 0, u'gp': 2, u'int': 3, u'rztar': 0, u'retd': 0, u'pct': u'66.33%', u'tchs': 7, u'rutd': 1, u'payds': 692, u'rec': 0, u'ruyds': 49, u'patd': 4, u'reypc': u'0.00', u'rzatt': 1, u'fpts': u'53.58', u'ruypc': u'7.00', u'att': 98, u'ruatt': 7, u'team': u'San Diego Chargers', u'reyds': 0, u'cmp': 65, u'abbr': u'SDC'}, {u'fuml': 2, u'tyds': 24, u'tar': 1, u'gp': 2, u'int': 4, u'rztar': 0, u'retd': 0, u'pct': u'60.00%', u'tchs': 8, u'rutd': 0, u'payds': 507, u'rec': 1, u'ruyds': 21, u'patd': 2, u'reypc': u'3.00', u'rzatt': 1, u'fpts': u'28.68', u'ruypc': u'3.00', u'att': 85, u'ruatt': 7, u'team': u'Arizona Cardinals', u'reyds': 3, u'cmp': 51, u'abbr': u'ARI'}, {u'fuml': 0, u'tyds': 41, u'tar': 1, u'gp': 2, u'int': 0, u'rztar': 0, u'retd': 0, u'pct': u'62.86%', u'tchs': 15, u'rutd': 0, u'payds': 424, u'rec': 1, u'ruyds': 57, u'patd': 1, u'reypc': u'-16.00', u'rzatt': 1, u'fpts': u'29.06', u'ruypc': u'4.07', u'att': 70, u'ruatt': 14, u'team': u'Los Angeles Rams', u'reyds': -16, u'cmp': 44, u'abbr': u'LAR'}, {u'fuml': 1, u'tyds': 47, u'tar': 0, u'gp': 2, u'int': 3, u'rztar': 0, u'retd': 0, u'pct': u'54.67%', u'tchs': 9, u'rutd': 0, u'payds': 483, u'rec': 0, u'ruyds': 47, u'patd': 4, u'reypc': u'0.00', u'rzatt': 1, u'fpts': u'39.02', u'ruypc': u'5.22', u'att': 75, u'ruatt': 9, u'team': u'San Francisco Niners', u'reyds': 0, u'cmp': 41, u'abbr': u'SFO'}, {u'fuml': 0, u'tyds': 22, u'tar': 0, u'gp': 2, u'int': 0, u'rztar': 0, u'retd': 0, u'pct': u'57.63%', u'tchs': 8, u'rutd': 1, u'payds': 425, u'rec': 0, u'ruyds': 22, u'patd': 0, u'reypc': u'0.00', u'rzatt': 2, u'fpts': u'28.20', u'ruypc': u'2.75', u'att': 59, u'ruatt': 8, u'team': u'Seattle Seahawks', u'reyds': 0, u'cmp': 34, u'abbr': u'SEA'}]

In [9]: print([d["team"] for d in data])
[u'Baltimore Ravens', u'Cincinnati Bengals', u'Cleveland Browns', u'Pittsburgh Steelers', u'Chicago Bears', u'Detroit Lions', u'Green Bay Packers', u'Minnesota Vikings', u'Buffalo Bills', u'Miami Dolphins', u'New England Patriots', u'New York Jets', u'Houston Texans', u'Indianapolis Colts', u'Jacksonville Jaguars', u'Tennessee Titans', u'Dallas Cowboys', u'New York Giants', u'Philadelphia Eagles', u'Washington Redskins', u'Atlanta Falcons', u'Carolina Panthers', u'New Orleans Saints', u'Tampa Bay Buccaneers', u'Denver Broncos', u'Kansas City Chiefs', u'Oakland Raiders', u'San Diego Chargers', u'Arizona Cardinals', u'Los Angeles Rams', u'San Francisco Niners', u'Seattle Seahawks']

On a side note, use raw strings for your paths and open your files using with.

with open(r'C:\Users\Josh\Documents\Python\outFileRoto.txt', 'w') as f
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • Wow thank you! Is this code specific to this site only? What i mean is, do I always use: (r'data\s+=\s+') and (r"data\s+=\s+(\[.*?\]) to get json data from any site? Or do I have to change that code depending on how the site is built? – Jay Sep 20 '16 at 15:44
  • @LikeWhiteOnRice, it is specific to this site, some sites render the content dynamically through Javascript or css, some give you the source pretty much the same as what you see in the broswer. If you look in the source returned you can see the json data. If you want to scrape a website, check the actual source returned as it is often not what you see in your browser, also use chrome tools/firebug are essential tools. – Padraic Cunningham Sep 20 '16 at 15:51
1

To drill down further, get beautifulsoup to return the div that has the class "rgt-col", and the style "display: block;".

Once you have that, drill down further by finding all the divs within that div, but ignoring the first result. Or you can also get all the divs that do not have a class.

EDIT 1: This answer was provided with the assumption that the html code was already available, and all that was needed was to drill down to get the specific elements. However, as mentioned by the Padraic Cunningham and Casey wireman, the desired data is dynamically loaded, and as such, the html is not available in the first place. Therefore, the first step would be to obtain the html first, maybe through identifying and loading the json endpoint, or, through the use of tools which allow for browser automation such as selenium.

EDIT 2: In this case however, it seems that the desired data is already in the html, in json format. All that's left is to parse this as was done by Padraic Cunningham in his answer.

B B
  • 1,116
  • 2
  • 8
  • 20
  • What code do I use to find all divs without a class? – Jay Sep 20 '16 at 04:19
  • Here are a couple of ways to do it http://stackoverflow.com/questions/9061094/beautiful-soup-extract-element-with-no-class-attribute http://stackoverflow.com/questions/18443694/how-to-extract-values-with-beautifulsoup-with-no-class – B B Sep 20 '16 at 04:36
  • There is no data to get, it is dynamically created. – Padraic Cunningham Sep 20 '16 at 15:36
  • Thanks, I made the assumption that the html was already rendered and available, and that all that was needed was isolating the team names. I've checked the html source, it does seem like the team names are embedded in it, just in json format. I've adjusted my answer to reduce confusion. – B B Sep 20 '16 at 16:53
0

It looks like your main problem is that the table you're interested in is dynamically built by JS. See this answer for info on scraping dynamically loaded content. How to retrieve the values of dynamic html content using Python

Alternatively, it looks like they have the initialization data that they generate the table with in the page, you could scrape that and parse the array that way if you don't want to go to the trouble of setting up things like selenium.

Community
  • 1
  • 1