8

I am trying to fetch the links to all accomodations in Cyprus from this website: http://www.zoover.nl/cyprus

So far I can retrieve the first 15 which are already shown. So now I have to invoke the click on the "volgende"-link. However I don't know how to do that and in the source code I am not able to track down the function called to use e.g. sth like posted here: Issues with invoking "on click event" on the html page using beautiful soup in Python

I only need the step where the "clicking" happens so I can fetch the next 15 links and so on.

Does anybody know how to help? Thanks already!

EDIT:

My code looks like this now:

def getZooverLinks(country):
    zooverWeb = "http://www.zoover.nl/"
    url = zooverWeb + country
    parsedZooverWeb = parseURL(url)
    driver = webdriver.Firefox()
    driver.get(url)

    button = driver.find_element_by_class_name("next")
    links = []
    for page in xrange(1,3):
        for item in parsedZooverWeb.find_all(attrs={'class': 'blue2'}):
            for link in item.find_all('a'):
                newLink = zooverWeb + link.get('href')
                links.append(newLink)
        button.click()'

and I get the following error:

selenium.common.exceptions.StaleElementReferenceException: Message: Element is no longer attached to the DOM Stacktrace: at fxdriver.cache.getElementAt (resource://fxdriver/modules/web-element-cache.js:8956) at Utils.getElementAt (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:8546) at fxdriver.preconditions.visible (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:9585) at DelayedCommand.prototype.checkPreconditions_ (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12257) at DelayedCommand.prototype.executeInternal_/h (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12274) at DelayedCommand.prototype.executeInternal_ (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12279) at DelayedCommand.prototype.execute/< (file:///var/folders/n4/fhvhqlmx23s8ppxbrxrpws3c0000gn/T/tmpKFL43_/extensions/fxdriver@googlecode.com/components/command-processor.js:12221)

I'm confused :/

Community
  • 1
  • 1
steph
  • 555
  • 2
  • 6
  • 21

2 Answers2

7

While it might be tempting to try to do this using Beautifulsoup's evaluateJavaScript method, in the end Beautifulsoup is a parser rather than an interactive web browsing client.

You should seriously consider solving this with selenium, as briefly shown in this answer. There are pretty good Python bindings available for selenium.

You could just use selenium to find the element and click it, and then pass the page on to Beautifulsoup, and use your existing code to fetch the links.

Alternatively, you could use the Javascript that's listed in the onclick handler. I pulled this from the source: EntityQuery('Ns=pPopularityScore%7c1&No=30&props=15292&dims=530&As=&N=0+3+10500915');. The No parameter increments with 15 for each page, but the props has me guessing. I'd recommend not getting into this, though, and just interact with the website as a client would, using selenium. That's much more robust to changes on their side, as well.

Community
  • 1
  • 1
Joost
  • 4,094
  • 3
  • 27
  • 58
  • Great tip, and it seems to do what I wanted it to do. Anyhow, there is a problem you may help me with – steph Apr 01 '15 at 10:19
  • What would that problem be? – Joost Apr 01 '15 at 10:21
  • I'm sorry, I'm stuck with slow Internet connection so I pressed the button too often ;) You can find the problem under EDIT – steph Apr 01 '15 at 10:30
  • Generally it's best to stick to one subject per SO question.. In any case, you seem to be using the old state of `parsedZooverWeb` after changing the page. Instead, replace the call to `find_all` with another `driver.find_element_by_class_name`. – Joost Apr 01 '15 at 10:39
  • Thanks a lot! Sorry for mixing questions though! – steph Apr 01 '15 at 10:47
6

I tried the following code and was able to load next page. Hope this will help you too. Code:

from selenium import webdriver
import os
chromedriver = "C:\Users\pappuj\Downloads\chromedriver"
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver)
url='http://www.zoover.nl/cyprus'
driver.get(url)
driver.find_element_by_class_name('next').click()

Thanks