24

Selenium driver.get (url) wait till full page load. But a scraping page try to load some dead JS script. So my Python script wait for it and doesn't works few minutes. This problem can be on every pages of a site.

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.cortinadecor.com/productos/17/estores-enrollables-screen/estores-screen-corti-3000')
# It try load: https://www.cetelem.es/eCommerceCalculadora/resources/js/eCalculadoraCetelemCombo.js 
driver.find_element_by_name('ANCHO').send_keys("100")

How to limit the time wait, block AJAX load of a file, or is other way?

Also I test my script in webdriver.Chrome(), but will use PhantomJS(), or probably Firefox(). So, if some method uses a change in browser settings, then it must be universal.

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
bl79
  • 1,291
  • 1
  • 15
  • 23

2 Answers2

53

When Selenium loads a page/url by default it follows a default configuration with pageLoadStrategy set to normal. To make Selenium not to wait for full page load we can configure the pageLoadStrategy. pageLoadStrategy supports 3 different values as follows:

  1. normal (full page load)
  2. eager (interactive)
  3. none

Here is the code block to configure the pageLoadStrategy :

  • Firefox :

    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    
    caps = DesiredCapabilities().FIREFOX
    caps["pageLoadStrategy"] = "normal"  #  complete
    #caps["pageLoadStrategy"] = "eager"  #  interactive
    #caps["pageLoadStrategy"] = "none"
    driver = webdriver.Firefox(desired_capabilities=caps, executable_path=r'C:\path\to\geckodriver.exe')
    driver.get("http://google.com")
    
  • Chrome :

    from selenium import webdriver
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
    
    caps = DesiredCapabilities().CHROME
    caps["pageLoadStrategy"] = "normal"  #  complete
    #caps["pageLoadStrategy"] = "eager"  #  interactive
    #caps["pageLoadStrategy"] = "none"
    driver = webdriver.Chrome(desired_capabilities=caps, executable_path=r'C:\path\to\chromedriver.exe')
    driver.get("http://google.com")
    

Note : pageLoadStrategy values normal, eager and none is a requirement as per WebDriver W3C Editor's Draft but pageLoadStrategy value as eager is still a WIP (Work In Progress) within ChromeDriver implementation. You can find a detailed discussion in “Eager” Page Load Strategy workaround for Chromedriver Selenium in Python

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • 2
    It works in Firefox(). In Chrome() the "eager" option throws an error "unsupported". Run follows: caps = DesiredCapabilities().CHROME caps["pageLoadStrategy"] = "none" driver = webdriver.Chrome(desired_capabilities=caps) driver.get('https://href...') time.sleep(5) driver.find_element_by_name('ANCHO').send_keys("100") – bl79 Jun 28 '17 at 04:55
  • @bl79 Yes :) I know. What I suggested is from WebDriver's W3C recommendation. ChromeDriver will follow the suit soon. Thanks – undetected Selenium Jun 28 '17 at 04:58
  • Instead of `time.sleep`, better to use `driver.implicitly_wait` – Kamil Sindi Jun 24 '18 at 23:36
  • 1
    Chrome still hasn't followed suit and it doesn't seem like they will – Tim Wachter Aug 30 '18 at 10:07
  • @TimWachter Checkout my answer update and let me know your thoughts. – undetected Selenium Oct 13 '18 at 08:24
  • @DebanjanB though `caps["pageLoadStrategy"] = "none"` gives back the control over driver in chrome, its quite useless if the page is still loading. You cant call `driver.execute_script("window.stop();")` it wont work in chrome, but ff works perfectly fine. – Ja8zyjits May 03 '19 at 09:35
  • What's the difference between using eager vs getting driver.page_source after the timeout exception ? – user2396640 Oct 24 '22 at 13:39
  • This answer need to be updated to selenium 4. – Rocky Li Aug 07 '23 at 03:47
0

@undetected Selenium answer works well but for the chrome, part its not working use the below answer for chrome

from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capa = DesiredCapabilities.CHROME
capa["pageLoadStrategy"] = "none"
browser= webdriver.Chrome(desired_capabilities=capa,executable_path='PATH',options=options)

Darkknight
  • 1,716
  • 10
  • 23