0

Given the following slightly pseudo code:

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver import ChromeOptions, Chrome

options = ChromeOptions()
driver = Chrome(options=options)

waiter = WebDriverWait(driver, 10)
list_of_urls = [<list_of_urls>]

for url in list_of_urls:
    locator = (By.XPATH, "xpath_element_A")
    element_A_condition = expected_conditions.presence_of_element_located(locator)
    element_A = waiter.until(element_A_condition)

    try:
        locator = (By.XPATH, "xpath_sub_element_A")
        sub_element_A_condition = expected_conditions.presence_of_element_located(locator)
        sub_element_A = waiter.until(sub_element_A_condition)
    except TimeoutException as e:
        raise e

I'm finding that about 2-3% of the URLs I try to scrape are raising the TimeoutException.

I've tried extending the wait time and I've even tried refreshing the page multiple times and attempting the entire page-scrape again - all to no avail.

To try and get to the bottom of this I put a breakpoint on the final line and ran the code in debugging mode. When the exception was raised and the break point hit I ran waiter.until(sub_element_A_condition) again in the debug terminal and it immediately returned sub_element_A.

I've now repeated this debugging process multiple times and the result is always the same - the TimeoutException is raised and the break point hit but I'm able to immediately run waiter.until(sub_element_A_condition) and it always returns the element.

This is most perplexing. The only thing I think I've done differently when the exceptions were raised was that I switched to the window (I run non-headless) to manually eyeball that the element was on the page. Could that be doing something that causes the element to become visible?

Ajeet Verma
  • 2,938
  • 3
  • 13
  • 24
Jossy
  • 589
  • 2
  • 12
  • 36

2 Answers2

0

As you are trying to scrape, instead of presence_of_element_located() you need to induce WebDriverWait for the visibility_of_element_located() and your modified code block will be:

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.support.ui import WebDriverWait
from selenium.common.exceptions import TimeoutException
from selenium.webdriver import ChromeOptions, Chrome

options = ChromeOptions()
driver = Chrome(options=options)

waiter = WebDriverWait(driver, 10)
list_of_urls = [<list_of_urls>]

for url in list_of_urls:
    locator = (By.XPATH, "xpath_element_A")
    element_A_condition = expected_conditions.visibility_of_element_located(locator)
    element_A = waiter.until(element_A_condition)

    try:
        locator = (By.XPATH, "xpath_sub_element_A")
        sub_element_A_condition = expected_conditions.visibility_of_element_located(locator)
        sub_element_A = waiter.until(sub_element_A_condition)
    except TimeoutException as e:
        raise e

In a single line:

element_A = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "xpath_element_A")))

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thanks for this! I've implemented what you suggest and so far so good. However, it takes a while for the rest of the application to generate the URLs so it might be a couple of days before I can conclusively say it fixed things! If all good then I will be back to hit that tick :) – Jossy Jul 10 '23 at 01:54
  • Hi. Afraid I'm still triggering the same `TimeoutException` :( – Jossy Jul 10 '23 at 03:24
0

you should add waits, try use contains in your xpath instead

roudlek
  • 160
  • 9