1

I have a script with python and selenium to scrape google results.. It works, but I'm looking for a better solution to wait until all 100 search results are fetched

I use this solution to wait until the search is done

driver.wait.until(EC.presence_of_element_located(
    (By.ID, 'resultStats')))

This works, but I need to get 100 search results so I do this

driver.get(driver.current_url+'&num=100')

But now its not possible to re-use this line because the element ID is already written to the page..

driver.wait.until(EC.presence_of_element_located(
    (By.ID, 'resultStats')))

Instead I use this solution, but its not a consistent solution (if the request takes more than 5 secs)

time.sleep(5)

code

url = 'https://www.google.com'
driver.get(url)

try:
    box = driver.wait.until(EC.presence_of_element_located(
        (By.NAME, 'q')))
    box.send_keys(query.decode('utf-8'))
    button = driver.wait.until(EC.element_to_be_clickable(
        (By.NAME, 'btnG')))
    button.click()
except TimeoutException:
    error('Box or Button not found in google.com')

try:
    driver.wait.until(EC.presence_of_element_located(
        (By.ID, 'resultStats')))
    driver.get(driver.current_url+'&num=100')

    # Need a better solution to wait until all results are loaded
    time.sleep(5)

    print driver.find_element_by_tag_name('body').get_attribute('innerHTML').encode('utf-8')
except TimeoutException:
    error('No results returned by Google. Could be HTTP 503 response')
clarkk
  • 27,151
  • 72
  • 200
  • 340

1 Answers1

3

You are absolutely right that time.sleep(5) is not a reliable and good way to wait for something on the page. You would need to use WebDriverWait class and a specific condition to wait for.

In this case, I'd wait for the count of elements with class="g" (which represents a search result) would be greater or equal to 100 via a custom Expected Condition:

from selenium.common.exceptions import StaleElementReferenceException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

class wait_for_n_elements(object):
    def __init__(self, locator, count):
        self.locator = locator
        self.count = count

    def __call__(self, driver):
        try:
            count = len(EC._find_elements(driver, self.locator))
            return count >= self.count
        except StaleElementReferenceException:
            return False

Usage:

wait = WebDriverWait(driver, 10)
wait.until(wait_for_n_elements((By.CSS_SELECTOR, ".g"), 100)
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195