0

I am trying to escape the 'page is loading' trick that some websites use to avoid scrapers. Therefore I need the simplest code to make sure I wait proper amount of time before trying to checkout the HTML content of the scraped website. The condition that I am looking for is "Wait until the page has "title" OR "meta description" OR "keywords" OR just any text other than loading or wait etc." I have checked several hours for this simple looking thing but to no avail, it seems that using selenium is not so easy as I thought.

import undetected_chromedriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


options = Options()
options.add_argument('--ignore-ssl-errors=yes')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')

driver = undetected_chromedriver.Chrome(service=Service(ChromeDriverManager().install()), 
                            use_subprocess=True,
                            options=options)

# timeout = 5
# wait = WebDriverWait(driver, timeOut)
# wait.until("WHAT????")

web_link = "amazon.com"
driver.get(f"http://{web_link}")
driver.page_source

Note: I would like to achieve this feat using ExplicitWait instead of the ImplicitWait since some of the websites that I would like to scrape already load without any sort of scraper protection. It's for the best if we don't lose time with those.

Semzem
  • 73
  • 9

1 Answers1

0

To wait until the Page Title contains a certain text you can use either of the following expected_conditions:

  • title_contains(title): The expectation for checking that the title contains a case-sensitive substring.

    WebDriverWait(driver, 10).until(EC.title_contains("partial_expected_page_title"))
    
  • title_is(title): The expectation for checking the title of a page.

    WebDriverWait(driver, 10).until(EC.title_is("expected_page_title"))
    
  • Note: You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    

References

You can find a couple of relevant detailed discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352