Non-deterministic behavior of ChromeDriver and Chrome with pageLoadStrategy none

Question

I have a web crawler using Selenium and Chromium running on Ubuntu Linux 16.04. All new crawling requests come in to Apache/WSGI, which creates a new python thread for each request and spawns Chromium process with pyvirtualdisplay and Xvfb to load the website, login, take screenshots, etc.

I use Chromium with the flags: disable-extensions, disable-gpu, headless, no-sandbox

caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "none"

I then have a function that checks every second to see if the page is loaded yet (as some of the pages don't fully load within a reasonable time, so I try to wait until they're at least interactive before proceeding):

driver.execute_script("var state = document.readyState; return state;")

The weird thing is that now when I try to load a page, it immediately says it's in state 'complete' (and continues to be so for the next 15 seconds). But when I actually try to find an element, it can't be found - so I don't think it's actually loaded. Normally it will say it's 'loading' and then 'interactive', etc.

I've tried restarting Apache, but doesn't seem to have fixed anything. What could be wrong?

I can see in my process list that Chromium and Xvfb are indeed running when the new request comes in:

7429 ?        S      0:00 Xvfb -br -nolisten tcp -screen 0 1024x768x24 :2165
7430 ?        Sl     0:00 /var/www/html/flaskapp/chromedriver --port=39146
7438 ?        Sl     0:00 /usr/lib/chromium-browser/chromium-browser --disable-background-networking --disable-client-side-phishing
7440 ?        S      0:00 /usr/lib/chromium-browser/chromium-browser --type=zygote --no-sandbox --enable-logging --headless --log-l
7457 ?        Sl     0:00 /usr/lib/chromium-browser/chromium-browser --type=gpu-process --no-sandbox --enable-logging --headless --
7468 ?        S      0:00 /usr/sbin/apache2 -k start
7469 ?        Sl     0:00 /usr/lib/chromium-browser/chromium-browser --type=renderer --no-sandbox --enable-automation --enable-logg

score 0 · Answer 1 · answered Oct 04 '18 at 14:37

You need to configure the ChromeDriver with the required parameters. A few points:

If you are at all waiting until the elements are at least interactive before proceeding then there is no point in configuring:
```
caps["pageLoadStrategy"] = "none"
```
A better approach would be to either configure pageLoadStrategy as normal or stick to the default policy removing the configuration all together:
```
caps["pageLoadStrategy"] = "normal"  #  complete
```
You can find a detailed discussion in:
- How to make Selenium not wait till full page load, which has a slow script?
- Page load strategy for Chrome driver (Updated till Selenium v3.12.0)
pageLoadStrategy as eager is yet to be implemented in Chrome.
You can find a detailed discussion in:
- Re: [chromedriver] Add page loading strategy to capabilities (issue 2125123002 by evaj...@google.com)
- Issue 2125123002: [chromedriver] Add page loading strategy to capabilities (Closed)
So to ensure that your desired elements are loaded you need to induce WebDriverWait as per the prevailing conditions as detailed out in the discussion How can I make sure if some HTML elements are loaded for Selenium + Python?

Non-deterministic behavior of ChromeDriver and Chrome with pageLoadStrategy none

1 Answers1