0

I'm using Nodejs and npm module Phantom to scrap a web page. The info that I need is placed with a ajax request when a span is clicked.

Objective: In site 'www.academiadasapostas.com/stats/team/961#tab=t_stats' I want to click in 'Bundesliga' button to scrap info.

Problem: I can't go directly to button url (www.academiadasapostas.com/stats/team/961#tab=t_stats&team_id=961&competition_id=9&page=1) and I don't know how to click button in Phantom.

My code:

var url = 'https://www.academiadasapostas.com/stats/team/961#tab=t_stats';
phantomInstance.createPage()
    .then((page) => {
        phantomPage = page;
        return page.open(url);
    })
    .then((status) => {
        phantomPage.evaluate(function() {
            //trying click
            return document.querySelectorAll('[data-id]')[1].click();
        })
        .then(function(){
            return phantomPage.property('content');
        })
        .then((content) => {
            // handle content of page
        });
    });

HTML snapshot:

<td> 
    <span class="competition all " data-id="0" onclick="teamAjax_Filterchange(this)" style="float: left; display: none;">Tudo
    </span>
    <span class="competition " data-id="9" onclick="teamAjax_Filterchange(this)">                                  
        <ul class="flag" title=""><li class="ar a80" title=""></li><li class="co c1"></li><li class="co chover"></li></ul>Bundesliga
    </span>
    <span class="competition " data-id="10" onclick="teamAjax_Filterchange(this)">                                     
        <ul class="flag" title=""><li class="ar a7" title=""></li><li class="co clc"></li><li class="co chover"></li></ul>UEFA Champions League
    </span>
</td>

EDIT 1: I try this but seems doesn't work too:

phantomPage.evaluate(function() { 
    var ev = document.createEvent("MouseEvent");
    ev.initMouseEvent(
        "click",
        true /* bubble */, true /* cancelable */,
        window, null,
        0, 0, 0, 0, /* coordinates */
        false, false, false, false, /* modifier keys */
        0 /*left*/, null
    );
    return document.querySelectorAll('[data-id]')[1].dispatchEvent(ev);
})
  • Have you tried the suggestions in [this question](http://stackoverflow.com/q/15739263/1816580)? You would have to port them to the way this is written for the bridge. – Artjom B. Apr 20 '16 at 19:30
  • Yes, I try the dispatchEvent sugestion but it seems doens't work too. (Added code: EDIT 1) – João Costa Apr 21 '16 at 12:14

1 Answers1

0

I was able to scrape that page with the following code using python and phantomjs :

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

url = 'https://www.academiadasapostas.com/stats/team/961#tab=t_stats&team_id=961'
driver = webdriver.PhantomJS()
driver.set_window_size(1024, 768)

xpath_IN = ".//*[@id='s']/div/div/div/div/div[2]/div/div[3]/div/table/tbody/tr[1]/td[2]/span[2]"
driver.get(url)

WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.XPATH, xpath_IN)))
driver.find_element_by_xpath(xpath_IN).click()

xpath_IN = ".//*[@id='s']/div/div/div/div/div[2]/div/div[3]/table[2]/tbody/tr[19]/td[1]"
WebDriverWait(driver, 40).until(EC.presence_of_element_located((By.XPATH, xpath_IN)))

soup = BeautifulSoup(driver.page_source, 'lxml')
f = open('temp.txt', 'w')
f.write(soup.prettify())
f.close()

driver.close()

I used the Bundesliga button's xpath to locate and click on it. Then I used xpath path again for the last line that appears after the click succeeds (Cartões vermelhos). This was done to wait for all items to load after the click.

I used BeautifulSoup to read that page and print it "prettified" to confirm that everything had loaded okay.

If you are unfamiliar with xpath, install the firebug and firepath addons in Firefox, and you can get the xpath by right-clicking the element you want to get it for.

Hope this helps.

dmdip
  • 1,665
  • 14
  • 15