1

I have a set of Web scrapers designed to run in Python 3.6 using Selenium ChromeDriver. All of them ran simply perfect.

This week I updated Selenium to v2.8 and ChromeDriver to v2.34.

Immediately, the scrapers failed to work normally and crash at some early point of the crawling.

I have a little implementation of sys.stdout that outputs both to a .txt and the console, so I started noticing the errors are like this:

Message: no such frame
(Session info: chrome=63.0.3239.108)
(Driver info: chromedriver=2.34.522940 
(1a76f96f66e3ca7b8e57d503b4dd3bccfba87af1),platform=Windows NT 10.0.15063 x86_64)

or

Message: no such element: Unable to locate element: 
{"method":"name","selector":"txtClave"}
(Session info: chrome=63.0.3239.108)
(Driver info: chromedriver=2.34.522940 
(1a76f96f66e3ca7b8e57d503b4dd3bccfba87af1),platform=Windows NT 10.0.15063 x86_64)

Message: no such element: Unable to locate element: 
{"method":"xpath","selector":"//*[@id="ctl00_cp_wz_ddlTarjetas"]/option[2]"}
(Session info: chrome=63.0.3239.108)
(Driver info: chromedriver=2.34.522940 
(1a76f96f66e3ca7b8e57d503b4dd3bccfba87af1),platform=Windows NT 10.0.15063 x86_64)

Message: no such element: Unable to locate element: 
{"method":"xpath","selector":"//*[@id="ctl00_cp_wz_ddlTarjetas"]/option[3]"}
(Session info: chrome=63.0.3239.108)
(Driver info: chromedriver=2.34.522940 
(1a76f96f66e3ca7b8e57d503b4dd3bccfba87af1),platform=Windows NT 10.0.15063 x86_64)

Message: no such element: Unable to locate element: 
{"method":"xpath","selector":"//*[@id="ctl00_cp_wz_ddlTarjetas"]/option[4]"}
(Session info: chrome=63.0.3239.108)
(Driver info: chromedriver=2.34.522940 
(1a76f96f66e3ca7b8e57d503b4dd3bccfba87af1),platform=Windows NT 10.0.15063 x86_64)

These are often followed by a ChromeDriver crash Windows message that was not there ever before: chromedriver.exe has stopped working.

While looking at the Chrome window, and by debugging, I suspect the error is caused at lines that should make the spider wait for a page to load, but it doesn't wait so it fails at finding the elements.

Examples of the lines causing the errors:

self.driver.find_element_by_name('txtUsuario').send_keys(user + Keys.RETURN)
self.driver.find_element_by_name('txtClave').send_keys(passwd + Keys.RETURN)

...

self.driver.switch_to.default_content()
self.driver.switch_to_frame('Fmenu')
self.driver.find_element_by_xpath(XPATH_POSICIONGLOBAL).click()

Finding it too sad (and basically surrendering) to fail over adding explicit waits to EVERY element to interact with (maybe because I have more than a hundred?).

I hope someone helps me find out what might have caused a whole set of working spiders to fail crawling in these new versions of ChromeDriver / Selenium, and workaround a code-elegant, and easy to implement solution for this.

For example, I tried appending a implicitly_wait to the WebDriver session, but it simply is not working at all.

def __init__(self):
    self.driver = webdriver.Chrome(PATH_WEBDRIVER)
    self.driver.implicitly_wait(10)

Finally, I used IDLE to run two of the failing spiders 1 Line at a time, and it works! So... why isn't it working at regular Spider execution?????

Many, many thanks in advance

Jesus21282
  • 43
  • 10

1 Answers1

1

Let me try to address your errors one by one :

  • Message: no such frame : This error can surface if you are trying to switch to the frame before it is available. In that case you need to induce WebDriverWait with a matching expected_conditions for the <iframe> to available for switching.

You can find a detailed discussion in How can I select a html element no matter what frame it is in in selenium?

  • Message: no such element: Unable to locate element: This error can surface if the WebElement with which you are trying to interact is either not present, visible, clickable or interactable. In that case you need to induce WebDriverWait with a matching expected_conditions. Here is a list of widely used methods:

    class selenium.webdriver.support.expected_conditions.presence_of_element_located(locator)        
    class selenium.webdriver.support.expected_conditions.visibility_of_element_located(locator)
    class selenium.webdriver.support.expected_conditions.element_to_be_clickable(locator)
    
  • chromedriver.exe has stopped working : This error can surface if your system JDK version, Selenium version, ChromeDriver version and version are not compatible with each other and out of sync.

  • self.driver.find_element_by_name('txtUsuario').send_keys(user + Keys.RETURN) : Try to break down the line in two steps as follows :

    self.driver.find_element_by_name('txtUsuario').send_keys(user)
    self.driver.find_element_by_name('txtUsuario').send_keys(Keys.RETURN)
    
  • self.driver.implicitly_wait(10): Try to get rid of Implicit Wait. To keep the leading WebDriver instance in sync with the trailing Web Client we have to use WebDriverWait i.e. ExplicitWait. But you shouldn't mix up Implicit Wait with ExplicitWait. See the note below:

Note: The Selenium Documentation clearly mentions - WARNING: Do not mix implicit and explicit waits. Doing so can cause unpredictable wait times. For example setting an implicit wait of 10 seconds and an explicit wait of 15 seconds, could cause a timeout to occur after 20 seconds.


Conclusion

If the requirement is only scraping / crawling webpages, use BeautifulSoup to scrape/crawl and use Selenium only to navigate through the pages.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • This is definitely a very disappointing issue I have with something that worked SO smoothly :( Been digging a LOT inside the WebDriver source code and suspect it is a ChromeDriver bug instead. I started applying the explicit wait method to retake work, but there are so many spiders to modify at +10 find_elements each, that I'm considering writing a Custom Waiter that can be attached to the driver (defined once per spider) instead of writing WebDriverWaits for every single failing find_element – Jesus21282 Dec 21 '17 at 14:29
  • Initialize **WebDriverWait** only once and try reusing the `WebDriverWait` instance on each and every `expected_conditions`. Again, as you are only crawling you can narrow down to a single `expected_conditions` which is **presence_of_element_located**. So a niche idea would be to write a `function()` and pass the concerned `WebElement` to be returned as **scrapable** / **crawlable**. Hope this helps. – undetected Selenium Dec 21 '17 at 14:34
  • Yes. Implemented 2 very alike functions inside my spiders: `def custom_find(self, by, selector): return WebDriverWait(self.driver, 30).until(EC.visibility_of_element_located((by, selector))) ` **...and ...** `def custom_frame_switch(self, locator): WebDriverWait(self.driver, 30).until(EC.frame_to_be_available_and_switch_to_it(locator))`. This way I can reuse the WebDriverWait instance as many times as needed without many complicated Replace executions in my Notepadd++. – Jesus21282 Dec 21 '17 at 22:09
  • In the end, decided to roll back to Chrome 62, since it completely fixes all the errors stated. – Jesus21282 Dec 25 '17 at 18:01
  • @Jesus21282 Although there is no Best Practices, but downgrading shouldn't be a solution. Try to use the latest builds and we will solve the hurdles one by one. – undetected Selenium Dec 26 '17 at 11:47