3

When I view the page source in my browser, the html I am after appears there. However, when I make a requests using python requests the html doesn't appear.

The url I'm trying to scrape is http://dota2lounge.com/match?m=13362, and the specific html I am after in the page is.

<div class="full">
    <a class="button" onclick="ChoseEvent(13362,'Whole Match',false)">Match</a>
    <a class="button" onclick="ChoseEvent(13392,'1st Game','1462327200')">1st Game</a>
    <a class="button" onclick="ChoseEvent(13424,'2nd Game','1462327200')">2nd Game</a>
    <br><div id="toma" class="full" style="background: #444;line-height: 2.5rem;border: 1px solid #333;text-align: center;">Whole Match</div>
</div>

I'd like to get the 'onclick' values of the buttons. So far I've tried:

r = requests.get('http://dota2lounge.com/match?m=13268')
soup = bs(r.content, 'lxml')
buttons = soup.find_all('a', class_='button')

Which doesn't work.

r.content

Doesn't appear to show the html either.

Peter
  • 142
  • 1
  • 12
  • Try ```soup.find_all('a', 'button')```. Btw sounds like you have a typo in the param class: ```soup.find_all('a', class='button')``` – Jeremie Ges May 04 '16 at 08:02

2 Answers2

1

Looks like the elements you want are being added by javascript that isn't being run when you make the request in python. Check out this question.

If you're just scraping this once (i.e. you just want the data and you're not trying to build a bot to play the game for you), the quickest option is often to just create a .htm file containing only links to every page you want to scrape (put each link in an <a> tag, you don't even need text). Then you can use a tool like downthemall in firefox to save a local copy of every page with the proper formatting.

Community
  • 1
  • 1
Joseph
  • 691
  • 1
  • 4
  • 12
0

try this

soup = BeautifulSoup(r.text, "html.parser")
for link in soup.findAll('a'):
        print link.get('onclick')
Suraj
  • 170
  • 5
  • Thanks but I tried your suggested parser and that didn't work. If I look into the text from the Request response I still can't see the html there. Are there any reasons it would be rendered in my browser but not in the Python Request? – Peter May 04 '16 at 09:54
  • i didn't find your html section in source code and try this code on http://dota2lounge.com/match?m=13362 url it find 2 onclick selectTeam($(this), 'a') FUNCTIONS there. – Suraj May 04 '16 at 10:43