2

I am making a username scraper and I really can't understand why the HTML is 'disappearing' when I parse it. Let's take this site for example: http://www.lolking.net/leaderboards#/eune/1

HTML output

See how there is a tbody and a bunch of tables in it? Well when I parse it and output it to the shell the tbody is empty

   <div style="background: #333; box-shadow: 0 0 2px #000; padding: 10px;">
    <table class="lktable" id="leaderboard_table" width="100%">
     <thead>
      <tr>
       <th style="width: 80px;">
        Rank
       </th>
       <th style="width: 80px;">
        Change
       </th>
       <th style="width: 100px;">
        Tier
       </th>
       <th>
        Summoner
       </th>
       <th style="width: 150px;">
        Top Champions
       </th>
      </tr>
     </thead>
     <tbody>
     </tbody>
    </table>
   </div>
  </div>

Why is this happening and how can I fix it?

T0xicCode
  • 4,583
  • 2
  • 37
  • 50
edsheeran
  • 91
  • 1
  • 11
  • 4
    It looks as if the table contents are generated using JavaScript. BeautifulSoup doesn't execute JavaScript, so the table is empty. Take a look at Selenium instead. – Aaron Christiansen Aug 23 '16 at 12:36
  • Take a look [here](http://stackoverflow.com/questions/8047666/how-to-combine-scrapy-and-htmlunit-to-crawl-urls-with-javascript) and [here](http://stackoverflow.com/questions/8550114/can-scrapy-be-used-to-scrape-dynamic-content-from-websites-that-are-using-ajax). You may find some useful info. – dot.Py Aug 23 '16 at 12:39
  • You don't need Selenium, just mimic the ajax requests and you can get all the data in json format – Padraic Cunningham Aug 23 '16 at 13:00
  • @PadraicCunningham, I came to the same conclusion. But he accepted another answer. – Nathan Aug 23 '16 at 13:02

3 Answers3

2

This site needs JavaScript to work. JavaScript is used to populate the table by forming a web request, which probably points to a back-end API. This means that the "raw" HTML, without the effects of any JavaScript, has an empty table.

We can actually see this empty table in the background if we visit the site with JavaScript disabled:

Screenshot

BeautifulSoup doesn't cause this JavaScript to execute. Instead, have a look at some alternative libraries which do, such as the more advanced Selenium.

Aaron Christiansen
  • 11,584
  • 5
  • 52
  • 78
1

As you can see in Chrome Dev Tools, the site sends 2 XHR requests to get the data, and displays it by using JavaScript.

Since BeautifulSoup is an HTML parser. It will not execute JavaScript. You should use a tool like selenium, which emulates a real browser.

But in this case you might be better of using the API, they use to get the data. You can easily see from which urls they get the data by looking in the 'Network' tab. Reload the page, select XHR and you can use the info to create your own requests using something like Python Requests.

Nathan
  • 900
  • 2
  • 10
  • 28
1

You can get all the data in json format, ll you need to do is parse a value from script tag inside the original page source and pass it to "http://www.lolking.net/leaderboards/some_value_here/eune/1.json":

from bs4 import BeautifulSoup
import requests
import re

patt = re.compile("\$\.get\('/leaderboards/(\w+)/")
js = "http://www.lolking.net/leaderboards/{}/eune/1.json"
soup = BeautifulSoup(requests.get("http://www.lolking.net/leaderboards#/eune/1").content)
script = soup.find("script", text=re.compile("\$\.get\('/leaderboards/"))

val = patt.search(script.text).group(1)
data = requests.get(js.format(val)).json()

data gives you json that contains all the player info like:

{'data': [{'division': '1',
           'global_ranking': '12',
           'league_points': '1217',
           'lks': '2961',
           'losses': '31',
           'most_played_champions': [{'assists': '238',
                                      'champion_id': '236',
                                      'creep_score': '7227',
                                      'deaths': '131',
                                      'kills': '288',
                                      'losses': '5',
                                      'played': '39',
                                      'wins': '34'},
                                     {'assists': '209',
                                      'champion_id': '429',
                                      'creep_score': '5454',
                                      'deaths': '111',
                                      'kills': '204',
                                      'losses': '3',
                                      'played': '27',
                                      'wins': '24'},
                                     {'assists': '155',
                                      'champion_id': '81',
                                      'creep_score': '4800',
                                      'deaths': '103',
                                      'kills': '168',
                                      'losses': '8',
                                      'played': '26',
                                      'wins': '18'}],
           'name': 'Sadastyczny',
           'previous_ranking': '2',
           'profile_icon_id': 7,
           'ranking': '1',
           'region': 'eune',
           'summoner_id': '42893043',
           'tier': '6',
           'tier_name': 'CHALLENGER',
           'wins': '128'},
          {'division': '1',
           'global_ranking': '30',
           'league_points': '1128',
           'lks': '2956',
           'losses': '180',
           'most_played_champions': [{'assists': '928',
                                      'champion_id': '24',
                                      'creep_score': '37601',
                                      'deaths': '1426',
                                      'kills': '1874',
                                      'losses': '64',
                                      'played': '210',
                                      'wins': '146'},
                                     {'assists': '501',
                                      'champion_id': '67',
                                      'creep_score': '16836',
                                      'deaths': '584',
                                      'kills': '662',
                                      'losses': '37',
                                      'played': '90',
                                      'wins': '53'},
                                     {'assists': '124',
                                      'champion_id': '157',
                                      'creep_score': '5058',
                                      'deaths': '205',
                                      'kills': '141',
                                      'losses': '14',
                                      'played': '28',
                                      'wins': '14'}],
           'name': 'Richor',
           'previous_ranking': '1',
           'profile_icon_id': 577,
           'ranking': '2',
           'region': 'eune',
           'summoner_id': '40385818',
           'tier': '6',
           'tier_name': 'CHALLENGER',
           'wins': '254'},
          {'division': '1',
           'global_ranking': '49',
           'league_points': '1051',
           'lks': '2953',
           'losses': '47',
           'most_played_champions': [{'assists': '638',
                                      'champion_id': '117',
                                      'creep_score': '11927',
                                      'deaths': '99',
                                      'kills': '199',
                                      'losses': '7',
                                      'played': '66',
                                      'wins': '59'},
                                     {'assists': '345',
                                      'champion_id': '48',
                                      'creep_score': '8061',
                                      'deaths': '99',
                                      'kills': '192',
                                      'losses': '11',
                                      'played': '43',
                                      'wins': '32'},
                                     {'assists': '161',
                                      'champion_id': '114',
                                      'creep_score': '5584',
                                      'deaths': '64',
                                      'kills': '165',
                                      'losses': '11',
                                      'played': '31',
                                      'wins': '20'}],
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321