2

I just started using selenium yesterday to help scrape some data and I'm having a difficult time wrapping my head around the selector engine. I know lxml, BeautifulSoup, jQuery and Sizzle have similar engines. But what I'm trying to do is:

  1. Wait 10 seconds for page to completely load
  2. Make sure there are the presence of ten or more span.eN elements (two load on intitial page load and more after)
  3. Then start processing the data with beautifulsoup

I am struggling with the selenium conditions of either finding the nth element or locating the specific text that only exists in an nth element. I keep getting errors (timeout, NoSuchElement, etc)

    url = "http://someajaxiandomain.com/that-injects-html-after-pageload.aspx"
    wd = webdriver.Chrome()
    wd.implicitly_wait(10)
    wd.get(url)
    # what I've tried
    # .find_element_by_xpath("//span[@class='eN'][10]"))
    # .until(EC.text_to_be_present_in_element(By.CSS_SELECTOR, "css=span[class='eN']:contains('foo')"))
user1645914
  • 371
  • 6
  • 23
  • It's hard to provide any solution without knowing the html ! Provide some html if possible – Saifur May 11 '15 at 21:11
  • Here is example of the prettified HTML: https://paste.ee/p/hR3f6 - I am after span.eN or tbody.EventBody being greater than 10 OR for a span.eN to contain "Triple Jump" (usually the last to load). It's really just the tabular data I'm interested in. Initially only 4 or 5 tbod[ies] load and then the rest is injected after the initial pageload. – user1645914 May 11 '15 at 21:37

1 Answers1

4

You need to understand the concept of Explicit Waits and Expected Conditions to wait for.

In your case, you can write a custom Expected Condition to wait for elements count found by a locator being equal to n:

from selenium.webdriver.support import expected_conditions as EC

class wait_for_n_elements_to_be_present(object):
    def __init__(self, locator, count):
        self.locator = locator
        self.count = count

    def __call__(self, driver):
        try:
            elements = EC._find_elements(driver, self.locator)
            return len(elements) >= self.count
        except StaleElementReferenceException:
            return False

Usage:

n = 10  # specify how many elements to wait for

wait = WebDriverWait(driver, 10)
wait.until(wait_for_n_elements_to_be_present((By.CSS_SELECTOR, 'span.eN'), n))

Probably, you could have also just used a built-in Expected Condition such as presence_of_element_located or visibility_of_element_located and wait for a single span.eN element to be present or visible, example:

wait = WebDriverWait(driver, 10)
wait.until(presence_of_element_located((By.CSS_SELECTOR, 'span.eN')))
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • @Saifur I really hope you are not leveraging your self appointed prowess or calling someone a "wrong person" to be a troll -- it's an honest question and I am grateful for those who are trying to help. – user1645914 May 11 '15 at 21:45
  • @user1645914 nono, we've just exchanged a couple of jokes and removed off-topic comments - the only one left so it's out of the context. Saifur is definitely here on SO to help. – alecxe May 11 '15 at 21:46
  • @user1645914 My apology. I personally respect alecxe and all his efforts and of course **ANYONE** who asks questions on SO. It's the best place on the earth to get help when you are ALONE in dark. – Saifur May 11 '15 at 21:48
  • @alecxe Thank you for your help. I am putting this into my code and still running into selenium.common.exceptions.TimeoutException: Message: '' -- it's probably on my end still but I will accept your detailed answer once I get it to work on my end. Thank you. – user1645914 May 11 '15 at 21:48
  • @user1645914 I haven't personally tested the provided custom expected condition, but it itself should work. TimeoutException could be caused by an incorrect CSS selector, for example. Thanks. – alecxe May 11 '15 at 21:50
  • @user1645914 I've also applied a small fix to the expected condition (using `>=` instead of `==`), please update it in your code accordingly. – alecxe May 11 '15 at 21:50
  • 1
    @Saifur thank you! Sorry for leaving your comment alone. Though, it was a funny coincidence. The comment totally changed it's sense without a context :) – alecxe May 11 '15 at 21:51
  • @alecxe thank you very much. I have accepted your answer. Only took me two hours to realize the HTML I was parsing is "off by 1" in one spot and a tag wasn't closed properly. Threw everything off! – user1645914 May 12 '15 at 00:13