2

I've been dealing with this issue for the past week and can't get my head around it so I decided to ask for help. I'm trying to run Selenium in AWS Lambda using a Chromium 86 build. The error message I'm keep getting is the following:

{
  "errorMessage": "Message: unknown error: Chrome failed to start: exited abnormally.\n  (chrome not reachable)\n  (The process started from chrome location /opt/bin/chromium is no longer running, so ChromeDriver is assuming that Chrome has crashed.)\n",
  "errorType": "WebDriverException"
}

Here's my build:

Selenium 3.14
Chromium 86.0.4240.0 (https://github.com/vittorio-nardone/selenium-chromium-lambda/blob/master/chromium.zip) which is forked from (https://github.com/puppeteer/puppeteer)
Chromedriver 86.0.4240.22.0 (https://chromedriver.storage.googleapis.com/index.html?path=86.0.4240.22/)

Here's my code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import logging

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):
    chrome_options = webdriver.ChromeOptions()
#   chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--headless')
    chrome_options.add_argument("start-maximized")
    chrome_options.add_argument("disable-infobars")
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--window-size=1024x768')
    chrome_options.add_argument('--user-data-dir=/tmp/user-data')
    chrome_options.add_argument('--profile-directory=/tmp')
    chrome_options.add_argument('--hide-scrollbars')
    chrome_options.add_argument('--enable-logging')
    chrome_options.add_argument('--log-level=0')
    chrome_options.add_argument('--v=99')
#   chrome_options.add_argument('--single-process')
    chrome_options.add_argument('--data-path=/tmp/data-path')
    chrome_options.add_argument('--ignore-certificate-errors')
    chrome_options.add_argument('--homedir=/tmp')
    chrome_options.add_argument('--disk-cache-dir=/tmp/cache-dir')
    chrome_options.add_argument('user-agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.3163.100 Safari/537.36')
    chrome_options.add_argument('--remote-debugging-port=9222')
    chrome_options.binary_location = "/opt/bin/chromium"

    driver = webdriver.Chrome(executable_path="/opt/bin/chromedriver",options=chrome_options)
    driver.get('https://www.google.com/')

The things I have tried so far:

  1. Tried various runtimes Python 3.6, 3.7, 3.8 no success
  2. Tried with and without Lambda layers. When trying with Lambda layer by folder structure is relatively simple:
.
├── bin
│   ├── chromedriver (binary)
│   └── chromium (binary)
└── python
    ├── selenium
    ├── selenium-3.14.0.dist-info
    ├── urllib3
    └── urllib3-1.26.7.dist-info
  1. Gone through majority of the comments here in SO where similar issues have been discussed examples:

Chrome Driver and Chromium Binaries are not working on aws lambda

WebDriverException: Message: unknown error: Chrome failed to start: crashed error using ChromeDriver Chrome through Selenium Python on Amazon Linux ..etc

  1. Tried almost all combinations of the arguments that I'm passing to the chromedriver like w/ & w/o --disable-dev-shm-usagem, w/ & w/o --disable-gpu etc.

The only thing I noticed is if I play with certain arguments sometimes it throws the selenium.common.exceptions.WebDriverException: Message: unknown error: unable to discover open window in chrome error instead of the Chrome failed to start: exited abnormally one. As a last idea I have I was thinking of compiling my own Chromium 86 build. Has there been anyone who managed to get build 86 or higher running on AWS Lambda?

vboxer00
  • 125
  • 2
  • 11
  • What is the place where Chrome.exe located – Sonali Das Dec 30 '21 at 15:23
  • The chrome binary is located either in a separate Layer which is attached to the Lambda function then its under /opt/bin/chromium or if I am not using any Layers then its under the function itself. – vboxer00 Dec 30 '21 at 15:29

1 Answers1

3

UPDATE 1/2/2022

I pretty much spent the last couple of days trying to figure out what could be the problem with my entire setup. Is it the code? The way I use lambda/layers? Binaries? Runtime env? Too many moving parts and I didn't want to fallback to Chromium 6x (that was my last working setup) as that's very ancient and certain features that I needed were not present..like features of the Chrome DevTools Protocol.

Then I stumbled across this repository which talks about how to utilise Amazon ECS with Lambda:

https://github.com/umihico/docker-selenium-lambda

Basically in a couple of minutes I was able to setup my container image linked to Lambda and it's running:

  1. Python 3.9.8
  2. Chromium 96.0.4664.0
  3. Chromedriver 96.0.4664.45
  4. Selenium 4.1.0

Then I ported over my function code and with a couple of changes I managed to get it working, finally! Here are my workings args:

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--disable-dev-tools')
chrome_options.add_argument('--remote-debugging-port=9222')
chrome_options.add_argument('--window-size=1280x1696')
chrome_options.add_argument('--user-data-dir=/tmp/chrome-user-data')
chrome_options.add_argument('--single-process')
chrome_options.add_argument("--no-zygote")
chrome_options.add_argument('--ignore-certificate-errors')
chrome_options.binary_location = "/opt/chrome/chrome"

driver = webdriver.Chrome
driver = webdriver.Chrome("/opt/chromedriver",options=chrome_options)
driver.get('https://www.google.com/')

The main difference between this setup and a pure Lambda one that with this you utilise ECS (container based) images and you are not running headless-chrome or serverless-chrome but you are running your daemon from chrome snapshots.

https://commondatastorage.googleapis.com/chromium-browser-snapshots/index.html

vboxer00
  • 125
  • 2
  • 11
  • So chrome or chromuim needs to be installed for selenium to correctly work...? – deostroll Mar 22 '22 at 16:22
  • Specifically, for this Lambda deployment I'm using a Chrome (Linux) snapshot from: https://commondatastorage.googleapis.com/chromium-browser-snapshots/index.html?prefix=Linux_x64/ – vboxer00 Mar 23 '22 at 11:36
  • vboxer - i am having the exact issue you were having. I need some help to get this to work on Ec2 - can you help? – Optional May 03 '22 at 03:57
  • You want this to run on an EC2? If so, you just need to have python installed on it and create your webscraper.py code. You could do set it to run on a given schedule by using cron job. – vboxer00 May 08 '22 at 16:17