10

I have an application where I need a long running instance of Selenium web driver (I am using Chrome driver 83.0.4103.39 in headless mode). Basically the app continuously pull url-data from a queue and gives the extracted url to Selenium which should perform some analysis on the website. Many of these websites could be down, unreachable or broken, so I've put a page load timeout of 10 seconds to avoid Selenium wait forever for page load.
The problem I am having here is that after some execution time (let's say 10 minutes) Selenium starts to give Timed out receiving message from renderer error for every url. Initially it works properly, it correctly opens the good websites and goes on timeout on the bad ones (website fails to load), but after some time it starts to give timeout on everything, even websites that should open correctly (I've checked, they open correctly on Chrome browser). I am having hard time to debug this problem, since every exception in the application is caught correctly. I have also noticed that this problem happens only in headless mode.

  • UPDATE *
    During website analysis I also need to consider iframes (only top level), thus I've also added a logic to switch driver context to each iframe in the main page and extract the relative html.

This is a simplified version of the application:

import traceback
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

width = 1024
height = 768

chrome_options = Options()
chrome_options.page_load_strategy = 'normal'
chrome_options.add_argument('--enable-automation')
chrome_options.add_argument('disable-infobars')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--lang=en')
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.add_argument('--allow-insecure-localhost')
chrome_options.add_argument('--allow-running-insecure-content')
chrome_options.add_argument('--disable-notifications')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-browser-side-navigation')
chrome_options.add_argument('--mute-audio')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--force-device-scale-factor=1')
chrome_options.add_argument(f'window-size={width}x{height}')

chrome_options.add_experimental_option(
    'prefs', {
        'intl.accept_languages': 'en,en_US',
        'download.prompt_for_download': False,
        'download.default_directory': '/dev/null',
        'automatic_downloads': 2,
        'download_restrictions': 3,
        'notifications': 2,
        'media_stream': 2,
        'media_stream_mic': 2,
        'media_stream_camera': 2,
        'durable_storage': 2,
    }
)

driver = webdriver.Chrome(options=options)
driver.set_page_load_timeout(10)  # Timeout 10 seconds

# Polling queue
while True:
    url = queue.pop()

    # Try open url
    try:
        driver.get(url)
    except BaseException as e:
        print(e)
        print(traceback.format_exc())
        continue

    # Take website screenshot
    png = driver.get_screenshot_as_png()

    # Extract html from iframes (if any)
    htmls = [driver.page_source]
    iframes = driver.find_elements_by_xpath("//iframe")

    for index, iframe in enumerate(iframes):
        try:
            driver.switch_to.frame(index)
            htmls.append(driver.page_source)
            driver.switch_to.default_content()
        except BaseException as e:
            print(e)
            print(traceback.format_exc())
            continue

    # Do some analysis
    for html in htmls:
        # ...
        pass

    # Wait a bit
    sleep(0.1)

This is an example of stack trace:

Opening https://www.yourmechanic.com/user/appointment/3732777/?access_token=HLZYIg&ukey=6quWpg1724633&rcode=abttgi&utm_medium=sms&utm_source= rb
LOAD EXCEPTION Message: timeout: Timed out receiving message from renderer: 10.000
  (Session info: headless chrome=83.0.4103.116)

Traceback (most recent call last):
  File "/Users/macbmacbookpro4ookpro4/Documents/Projects/python/proj001/main.py", line 202, in inference
    driver.get(url)
  File "/opt/anaconda3/envs/cv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "/opt/anaconda3/envs/cv/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/opt/anaconda3/envs/cv/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message: timeout: Timed out receiving message from renderer: 10.000
  (Session info: headless chrome=83.0.4103.116)

Has anyone any clue why after a while of correct execution Selenium starts to give timeout exception for any url it tries to open?

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
revy
  • 3,945
  • 7
  • 40
  • 85

1 Answers1

11

This error message...

selenium.common.exceptions.TimeoutException: Message: timeout: Timed out receiving message from renderer: 10.000

...implies that the ChromeDriver was unable to communicate with the Browsing Context i.e. Chrome Browser session.


Deep Dive

This error can arise due to several reasons. A couple of those reasons and the remedy are as follows:

  • disable-infobars and --enable-automation are almost analogous and disable-infobars is no more amaintained. --enable-automation will serve your purpose. So you need to drop:

    chrome_options.add_argument('disable-infobars')
    

You can find a detailed discussion in Unable to hide “Chrome is being controlled by automated software” infobar within Chrome v76

  • --enable-automation is still a experimental_option so you need to:

    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
    

You can find a detailed discussion in How can I use setExperimentalOption through Options using FirefoxDriver in Selenium IDE?

  • If you intend to use --enable-automation you need to use useAutomationExtension as well:

    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])
    chrome_options.add_experimental_option('useAutomationExtension', False)
    
  • --disable-gpu is no longer necessary, so you need to drop:

    chrome_options.add_argument('--disable-gpu')
    

You can find a detailed discussion in Chrome Options in Python Selenium : Disable GPU vs Headless

  • You can opt use a bigger Viewport through {width}x{height} e.g. 1920, 1080

    chrome_options.add_argument("window-size=1920,1080")
    

You can find a detailed discussion in How to set window size in Selenium Chrome Python

  • To initiate a instead of chrome_options.add_argument('--headless') you need to use the headless attribute as follows:

    chrome_options.headless = True
    

You can find a detailed discussion in How to configure ChromeDriver to initiate Chrome browser in Headless mode through Selenium?

  • As you have enumerated all the elements, it is worth to mention you can't switch_to all the <iframe> / <frame> as some of them may have the style attribute value set as display: none;.

You can find a detailed discussion in Expected condition failed: waiting for element to be clickable for element containing style=“display: none;”

You can find a couple of relevant discussions in:


References

You can find a couple of relevant detailed discussions on Timed out receiving message from renderer in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thanks for the answer! I've updated my question with a missing logic I was using to access iframes content (if any) on the website. I don't know if that can be related with the timeout problem though. – revy Jul 14 '20 at 12:27
  • @revy Updated the answer with some of iframe related issues and solution. Let me know if that solves your issue. – undetected Selenium Jul 14 '20 at 14:45
  • Thank you, I have updated my iframe logic to use `frame_to_be_available_and_switch_to_it` instead of `switch_to.frame`. It has processed correctly 1000 urls or so then has started again to fail all the `get(url)` with timeout exception... Anyway it must be something related to iframes because if I drop the iframe logic entirely the problem doesn't show up. – revy Jul 14 '20 at 16:27
  • 1
    But unfortunately the problem still persist. I was able to process 1000 urls after the change but still it starts to fail with timeout for each url at some point (before the change it started to fail after 500-700 urls or so). I need the solution to be robust, even if it fails for some url to load it should continue to run correctly without interruption...I guess there is some race condition or so related to iframe switching that makes it start to fail forever with timeout, but I wasn't able to catch it so far. – revy Jul 14 '20 at 17:05
  • 2
    Update: I've removed the iframes logic and run a test again. It has processed about 4500 urls than it has started again to throw timeout on everything...and finally hangs on the 4753 url (without any exception, just hang indefinitely). I don't know what else to try.. – revy Jul 15 '20 at 07:51