-2

I am attempting to web-scrape from a table on an open site using Python. I have checked to ensure that it will connect to the site using the command "page_soup.p" and got a return of the item with a 'p' tag.

When I check to ensure my scraping tag works with the command containers[0] I am met with:

Traceback (most recent call last)

File "", line 1, in

IndexError: list index out of range"

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://overwatchleague.com/en-us/stats'

# opening up connect, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

# html parsing
page_soup = soup(page_html, "html.parser")

# grabs each player
containers = page_soup.findAll("tr",{"class":"Table-row"})

There should be roughly 183 rows with that tag, obviously 0 is not what I was expecting. Any insight into what I did improperly?

Community
  • 1
  • 1
Noobguru
  • 3
  • 1
  • Some Javascript library is rendering those rows with that class in the browser, *after* the page is loaded. Look at the page source (even in the browser) and you will see they are not there, hence BeautifulSoup cannot find them. – sal Jun 28 '19 at 05:36
  • 1
    Check out this post: https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python – sal Jun 28 '19 at 05:38

1 Answers1

2

The data is loaded through JSON. To find out the correct URL look e.g. in Firefox developer tools what network connections is the page making:

import requests
from datetime import timedelta

url = 'https://api.overwatchleague.com/stats/players?stage_id=regular_season&season=2019'

data = requests.get(url).json()

print('{:^12}{:^12}{:^12}{:^20}'.format('Name', 'Team', 'Deaths', 'Time Played'))
print('-' * (12*3+20))
for row in data['data']:
    print('{:^12}'.format(row['name']), end='')
    print('{:^12}'.format(row['team']), end='')
    print('{:^12.2f}'.format(row['deaths_avg_per_10m']), end='')
    t = timedelta(seconds=float(row['time_played_total']))
    print('{:>20}'.format(str(t)))

Prints:

    Name        Team       Deaths       Time Played     
--------------------------------------------------------
    Ado         WAS         5.47         15:23:08.217194
   Adora        HZS         3.72          9:08:57.586787
 Agilities      VAL         5.27         17:16:59.668653
    Aid         TOR         5.08          8:02:19.102897
   AimGod       BOS         4.69         17:04:31.769137
    aKm         DAL         4.64         16:57:14.261245
   alemao       BOS         4.99          2:36:25.171021
   ameng        CDH         6.24         16:06:12.084212
   Anamo        NYE         2.36         17:33:31.143450
 Architect      SFS         4.33          3:18:45.065564
   ArHaN        HOU         6.39          1:54:10.439213
    ArK         WAS         2.50          9:32:57.421203

...and so on.
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91