0

I'm trying to scrape contents from this page on my linux machine. I want to display all the list of wines by clicking the show more button [around 600] until no show more buttons appear. I'm using selenium and PhantomJS for handling javascripts. I'm using time.sleep() show that once i click the show more button, it sleeps for some short time until another appears. The problem i'm facing is, initially the program clicks the show more button quickly but once it reaches around 100-150 clicks, the time taken to detect the show more button increases at an alarming rate, taking too much time. Below is the code that detects the show more button and clicks it.

    def parse(self,response):

     sel = Selector(self.driver.get(response.url))

     self.driver.get(response.url)
     click = self.driver.find_elements_by_xpath("//*[@id='btn-more-wines']")

     try:
        while click:
            click[0].click()
            time.sleep(2)
     except Exception:
        print 'no clicks'
sulav_lfc
  • 772
  • 2
  • 14
  • 34

1 Answers1

1

An Explicit Wait (instead of time.sleep()) can make a positive impact here:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

wait = WebDriverWait(driver, 10)
click = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[@id='btn-more-wines']")))

This would basically wait for the "Show More" button to become clickable.


Another possible improvement could be achieved by switching to Chrome in a headless mode (with a virtual display):

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195