0

I have a web crawler using Selenium and Chromium running on Ubuntu Linux 16.04. All new crawling requests come in to Apache/WSGI, which creates a new python thread for each request and spawns Chromium process with pyvirtualdisplay and Xvfb to load the website, login, take screenshots, etc.

I use Chromium with the flags: disable-extensions, disable-gpu, headless, no-sandbox

caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "none"

I then have a function that checks every second to see if the page is loaded yet (as some of the pages don't fully load within a reasonable time, so I try to wait until they're at least interactive before proceeding):

driver.execute_script("var state = document.readyState; return state;")

The weird thing is that now when I try to load a page, it immediately says it's in state 'complete' (and continues to be so for the next 15 seconds). But when I actually try to find an element, it can't be found - so I don't think it's actually loaded. Normally it will say it's 'loading' and then 'interactive', etc.

I've tried restarting Apache, but doesn't seem to have fixed anything. What could be wrong?

I can see in my process list that Chromium and Xvfb are indeed running when the new request comes in:

7429 ?        S      0:00 Xvfb -br -nolisten tcp -screen 0 1024x768x24 :2165
7430 ?        Sl     0:00 /var/www/html/flaskapp/chromedriver --port=39146
7438 ?        Sl     0:00 /usr/lib/chromium-browser/chromium-browser --disable-background-networking --disable-client-side-phishing
7440 ?        S      0:00 /usr/lib/chromium-browser/chromium-browser --type=zygote --no-sandbox --enable-logging --headless --log-l
7457 ?        Sl     0:00 /usr/lib/chromium-browser/chromium-browser --type=gpu-process --no-sandbox --enable-logging --headless --
7468 ?        S      0:00 /usr/sbin/apache2 -k start
7469 ?        Sl     0:00 /usr/lib/chromium-browser/chromium-browser --type=renderer --no-sandbox --enable-automation --enable-logg
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
skunkwerk
  • 2,920
  • 2
  • 37
  • 55

1 Answers1

0

You need to configure the ChromeDriver with the required parameters. A few points:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352