This is the current situation: I built a pipeline with Gitlab CI/CD that pushes an image to an AWS ECR, which is run by an AWS Lambda. That runs the Dockerfile (and docker compose) which is in charge of setting the image (From this image public.ecr.aws/lambda/python:3.11) and installing chrome and chromedriver, and then running CMD [ "main.lambda_handler" ]. In the Dockerfile it runs the download and installation of chrome browser:
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm \
&& yum localinstall -y google-chrome-stable_current_*.rpm
and chromedriver from a fetch_chromedriver.sh:(There're commented versions because I've tried many and many dif. ways)
#!/bin/bash
set -x
# Fetch the latest release of ChromeDriver
#LATEST_RELEASE=$(curl -s https://chromedriver.storage.googleapis.com/LATEST_RELEASE)
#CHROMEDRIVER_URL="https://chromedriver.storage.googleapis.com/${LATEST_RELEASE}/chromedriver_linux64.zip"
#CHROMEDRIVER_URL="https://chromedriver.storage.googleapis.com/114.0.5735.90/chromedriver_linux64.zip"
CHROMEDRIVER_URL="https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing/115.0.5790.102/linux64/chromedriver-linux64.zip"
#CHROMEDRIVER_URL="https://chromedriver.storage.googleapis.com/72.0.3626.69/chromedriver_linux64.zip"
# Download and install ChromeDriver
curl -sSL "$CHROMEDRIVER_URL" -o chromedriver_linux64.zip \
&& unzip chromedriver_linux64.zip \
&& rm chromedriver_linux64.zip \
&& chmod +x chromedriver-linux64/chromedriver \
&& mv chromedriver-linux64/chromedriver /usr/local/bin/ \
&& rm -rf chromedriver-linux64 \
&& chmod -x /usr/local/bin/chromedriver \
&& chmod -x /usr/local/bin/*
The image builds with no problem, but when I run the AWS Lambda to test it, I get different errors when it gets to the method that sets the webdriver (I left different lines commented so you can see all the different things I've been trying):
def setup_webdriver(self) -> Chrome:
"""
Sets up and returns the Chrome WebDriver.
"""
opts = ChromeOptions()
#opts.binary_location = "/usr/bin/google-chrome"
# opts.add_argument("start-maximized") # open Browser in maximized mode
# opts.add_argument("disable-infobars")
# opts.add_argument('--no-sandbox')
# opts.add_argument('--disable-dev-shm-usage')
# opts.add_argument("--disable-gpu")
# opts.add_argument('--disable-extensions')
# opts.add_argument('--remote-debugging-port=9222')
# # To show error logs more verbose:
# opts.add_argument('--enable-logging')
# opts.add_argument('--v=1')
# opts.add_argument('--disable-software-rasterizer')
# opts.add_argument('--disable-browser-side-navigation')
# opts.add_argument('--crash-dumps-dir=/tmp')
#opts.add_experimental_option("excludeSwitches", ["enable-automation"])
#opts.add_experimental_option("detach", True)
#opts.add_experimental_option("debuggerAddress", "localhost:9014")
#opts.add_argument('--window-size=1920,1080')
#opts.add_experimental_option("useAutomationExtension", False)
#opts.add_argument('--user-data-dir=~/.config/google-chrome') # Changed this line with the line below because in Lambda the only writable directory is "/tmp"
#opts.add_argument('--user-data-dir=/tmp/.config/google-chrome')
# Set the temporary directory for ChromeDriverManager
tmp_dir = tempfile.mkdtemp(dir='/tmp')
os.environ["TMPDIR"] = tmp_dir
opts.add_argument('--headless')
# Get the latest version of ChromeDriver
#version = "115.0.5790.102"
# Install the latest version of ChromeDriver
#driver_path = ChromeDriverManager(path="/tmp", version="115.0.5790.102/linux64", url="https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing", latest_release_url="https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing").install()
#driver_path = ChromeDriverManager(path="/tmp", version="114.0.5735.90", url="https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing", latest_release_url="https://edgedl.me.gvt1.com/edgedl/chrome/chrome-for-testing").install()
#driver_path = ChromeDriverManager().install()
#driver_path = "/usr/local/bin/chromedriver"
#service = Service(driver_path)
service = Service()
#service = Service(executable_path="/usr/local/bin/chromedriver")
time.sleep(10)
driver = Chrome(service=service, options=opts)
driver.implicitly_wait(30)
selenium_log = logging.getLogger('selenium')
selenium_log.setLevel(logging.WARNING)
selenium_log.propagate = False
return driver
The current error I'm encountering is (Basically, unable to obtain chromedriver using Selenium Manager):
Exception: Message: Unable to obtain chromedriver using Selenium Manager; Message: Unsuccessful command executed: /var/lang/lib/python3.11/site-packages/selenium/webdriver/common/linux/selenium-manager --browser chrome --output json; Expecting value: line 1 column 1 (char 0)\n
; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors/driver_location\n\nTraceback (most recent call last):\n
File \"/var/lang/lib/python3.11/site-packages/selenium/webdriver/common/selenium_manager.py\", line 115, in run\n output = json.loads(stdout)\n
^^^^^^^^^^^^^^^^^^\n
File \"/var/lang/lib/python3.11/json/__init__.py\", line 346, in loads\n
return _default_decoder.decode(s)\n
^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/var/lang/lib/python3.11/json/decoder.py\", line 337, in decode\n obj, end = self.raw_decode(s, idx=_w(s, 0).end())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n
File \"/var/lang/lib/python3.11/json/decoder.py\", line 355, in raw_decode\n
raise JSONDecodeError(\"Expecting value\", s, err.value) from None\n
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)\n\n
During handling of the above exception, another exception occurred:\n\n
Traceback (most recent call last):\n
File \"/var/lang/lib/python3.11/site-packages/selenium/webdriver/common/driver_finder.py\", line 42, in get_path\n
path = SeleniumManager().driver_location(options) if path is None else path\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n
File \"/var/lang/lib/python3.11/site-packages/selenium/webdriver/common/selenium_manager.py\", line 96, in driver_location\n
result = self.run(args)\n
^^^^^^^^^^^^^^\n File \"/var/lang/lib/python3.11/site-packages/selenium/webdriver/common/selenium_manager.py\", line 118, in run\n
raise WebDriverException(f\"Unsuccessful command executed: {command}; {err}\")\n
selenium.common.exceptions.WebDriverException: Message: Unsuccessful command executed: /var/lang/lib/python3.11/site-packages/selenium/webdriver/common/linux/selenium-manager --browser chrome --output json; Expecting value: line 1 column 1 (char 0)\n\n\n
During handling of the above exception, another exception occurred:\n\n
Traceback (most recent call last):\n
File \"/var/task/main.py\", line 10, in lambda_handler\n
scrape_and_upload_to_s3()\n
File \"/var/task/run.py\", line 30, in scrape_and_upload_to_s3\n driver = spain_data_extractor.setup_webdriver()\n
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n
File \"/var/task/scraper.py\", line 99, in setup_webdriver\n
driver = Chrome(service=service, options=opts)\n
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n
File \"/var/lang/lib/python3.11/site-packages/selenium/webdriver/chrome/webdriver.py\", line 47, in __init__\n
self.service.path = DriverFinder.get_path(self.service, self.options)\n
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n
File \"/var/lang/lib/python3.11/site-packages/selenium/webdriver/common/driver_finder.py\", line 44, in get_path\n
raise NoSuchDriverException(f\"Unable to obtain {service.path} using Selenium Manager; {err}\")\nselenium.common.exceptions.NoSuchDriverException: Message: Unable to obtain chromedriver using Selenium Manager; Message: Unsuccessful command executed: /var/lang/lib/python3.11/site-packages/selenium/webdriver/common/linux/selenium-manager --browser chrome --output json; Expecting value: line 1 column 1 (char 0)\n; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors/driver_location
I don't really know what else can I try, anyone have ever had the same problem or had to do something similar and was able to make it work?
Now I'm trying with Selenium v.4.10.0 (I've tried some previous version too), webdriver v.3.7.0 (tried with v4.0.0 too)