2

I'd like to run a selenium script on AWS Lamda via a Docker container.

I'm using AWS EC2 to build and then locally test the container via AWS Lambda RIE. Once successfully tested, the container is registered on ECR so to feed AWS Lambda.

Despite RIE local test on EC2 always succeed, I can't manage to make Lambda working right. Lambda testing is currently always failing with the following error message:

{
  "errorMessage": "Message: session not created\nfrom tab crashed\n  (Session info: headless chrome=93.0.4577.63)\n",
  "errorType": "SessionNotCreatedException",
  "stackTrace": [
    "  File \"/var/task/app.py\", line 32, in handler\n    driver = webdriver.Chrome(\n",
    "  File \"/var/task/selenium/webdriver/chrome/webdriver.py\", line 76, in __init__\n    RemoteWebDriver.__init__(\n",
    "  File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 157, in __init__\n    self.start_session(capabilities, browser_profile)\n",
    "  File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 252, in start_session\n    response = self.execute(Command.NEW_SESSION, parameters)\n",
    "  File \"/var/task/selenium/webdriver/remote/webdriver.py\", line 321, in execute\n    self.error_handler.check_response(response)\n",
    "  File \"/var/task/selenium/webdriver/remote/errorhandler.py\", line 242, in check_response\n    raise exception_class(message, screen, stacktrace)\n"
  ]
}

Here you can find all the code I'm actually using:

Dockerfile

FROM public.ecr.aws/lambda/python:3.8

#Download and install Chrome
RUN curl https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm > ./google-chrome-stable_current_x86_64.rpm
RUN yum install -y ./google-chrome-stable_current_x86_64.rpm
RUN rm ./google-chrome-stable_current_x86_64.rpm

#Download and install chromedriver
RUN yum install -y unzip
RUN curl http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip > /tmp/chromedriver.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/local/bin/
RUN rm /tmp/chromedriver.zip
RUN yum remove -y unzip

#Upgrade pip and install python dependences
RUN pip3 install --upgrade pip
RUN pip3 install selenium --target "${LAMBDA_TASK_ROOT}"

#Copy app.py
COPY app.py ${LAMBDA_TASK_ROOT}

CMD ["app.handler"]

app.py


from selenium import webdriver
from selenium.webdriver.chrome.options import Options

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

def handler(event, context):

    chrome_options = Options()
    
    chrome_options.add_argument("--allow-running-insecure-content")
    chrome_options.add_argument("--ignore-certificate-errors")
    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--no-sandbox")
    chrome_options.add_argument("--disable-dev-shm-usage")
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--disable-dev-tools")
    chrome_options.add_argument("--no-zygote")
    chrome_options.add_argument("--v=99")
    chrome_options.add_argument("--single-process")

    chrome_options.binary_location = '/usr/bin/google-chrome-stable'

    capabilities = webdriver.DesiredCapabilities().CHROME
    capabilities['acceptSslCerts'] = True
    capabilities['acceptInsecureCerts'] = True
        
    driver = webdriver.Chrome(
        executable_path='/usr/local/bin/chromedriver',
        options=chrome_options,
        desired_capabilities=capabilities)

    if driver:
    
        response = {
            "statusCode": 200,
            "body": json.dumps("Selenium Driver Initiated")
        }
    
        return response

Local Container Testing with RIE

$ docker run -p 9000:8080 aws-scraper
  results in > time="2021-09-03T15:24:13.269" level=info msg="exec '/var/runtime/bootstrap' (cwd=/var/task, handler=)"

$ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'
  results in > {"statusCode": 200, "body": "\"Selenium Driver Initiated\""}[

I really can't figure it out. I also tried to follow Selenium works on AWS EC2 but not on AWS Lambda, but to no avail.

Any help would be more than welcomed. Thank you in advance.

lbrb
  • 21
  • 4
  • the error that was presented in the thread you mentioned isn't the one you're encountering, or am I missing somthing? did you verify this config against for example the mentioned thread? – Noam Yizraeli Sep 03 '21 at 16:24

1 Answers1

0

Solved by borrowing dockerfile and selenium webdriver chrome options from this repo: https://github.com/rchauhan9/image-scraper-lambda-container.git

Dockerfile now looks like the following:

# Define global args
ARG FUNCTION_DIR="/home/app/"
ARG RUNTIME_VERSION="3.9"
ARG DISTRO_VERSION="3.12"


# Stage 1
FROM python:${RUNTIME_VERSION}-alpine${DISTRO_VERSION} AS python-alpine

RUN apk add --no-cache \
    libstdc++

# Stage 2
FROM python-alpine AS build-image

RUN apk add --no-cache \
    build-base \
    libtool \
    autoconf \
    automake \
    libexecinfo-dev \
    make \
    cmake \
    libcurl

ARG FUNCTION_DIR
ARG RUNTIME_VERSION

RUN mkdir -p ${FUNCTION_DIR}

RUN python${RUNTIME_VERSION} -m pip install awslambdaric --target ${FUNCTION_DIR}


# Stage 3
FROM python-alpine as build-image2

ARG FUNCTION_DIR

WORKDIR ${FUNCTION_DIR}

COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}

RUN apk update \
    && apk add gcc python3-dev musl-dev \
    && apk add jpeg-dev zlib-dev libjpeg-turbo-dev

COPY requirements.txt .

RUN python${RUNTIME_VERSION} -m pip install -r requirements.txt --target ${FUNCTION_DIR}

# Stage 4
FROM python-alpine

ARG FUNCTION_DIR

WORKDIR ${FUNCTION_DIR}

COPY --from=build-image2 ${FUNCTION_DIR} ${FUNCTION_DIR}

RUN apk add jpeg-dev zlib-dev libjpeg-turbo-dev \
    && apk add chromium chromium-chromedriver

ADD https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie /usr/bin/aws-lambda-rie

RUN chmod 755 /usr/bin/aws-lambda-rie

COPY app/* ${FUNCTION_DIR}
COPY entry.sh /

ENTRYPOINT [ "/entry.sh" ]

CMD [ "app.handler" ]

and app.py now looks like the following;

import json

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

def handler(event, context):

    chrome_options = Options()
    
    chrome_options.add_argument('--autoplay-policy=user-gesture-required')
    chrome_options.add_argument('--disable-background-networking')
    chrome_options.add_argument('--disable-background-timer-throttling')
    chrome_options.add_argument('--disable-backgrounding-occluded-windows')
    chrome_options.add_argument('--disable-breakpad')
    chrome_options.add_argument('--disable-client-side-phishing-detection')
    chrome_options.add_argument('--disable-component-update')
    chrome_options.add_argument('--disable-default-apps')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument('--disable-domain-reliability')
    chrome_options.add_argument('--disable-extensions')
    chrome_options.add_argument('--disable-features=AudioServiceOutOfProcess')
    chrome_options.add_argument('--disable-hang-monitor')
    chrome_options.add_argument('--disable-ipc-flooding-protection')
    chrome_options.add_argument('--disable-notifications')
    chrome_options.add_argument('--disable-offer-store-unmasked-wallet-cards')
    chrome_options.add_argument('--disable-popup-blocking')
    chrome_options.add_argument('--disable-print-preview')
    chrome_options.add_argument('--disable-prompt-on-repost')
    chrome_options.add_argument('--disable-renderer-backgrounding')
    chrome_options.add_argument('--disable-setuid-sandbox')
    chrome_options.add_argument('--disable-speech-api')
    chrome_options.add_argument('--disable-sync')
    chrome_options.add_argument('--disk-cache-size=33554432')
    chrome_options.add_argument('--hide-scrollbars')
    chrome_options.add_argument('--ignore-gpu-blacklist')
    chrome_options.add_argument('--ignore-certificate-errors')
    chrome_options.add_argument('--metrics-recording-only')
    chrome_options.add_argument('--mute-audio')
    chrome_options.add_argument('--no-default-browser-check')
    chrome_options.add_argument('--no-first-run')
    chrome_options.add_argument('--no-pings')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--no-zygote')
    chrome_options.add_argument('--password-store=basic')
    chrome_options.add_argument('--use-gl=swiftshader')
    chrome_options.add_argument('--use-mock-keychain')
    chrome_options.add_argument('--single-process')
    chrome_options.add_argument('--headless')

    chrome_options.add_argument('--user-data-dir={}'.format('/tmp/user-data'))
    chrome_options.add_argument('--data-path={}'.format('/tmp/data-path'))
    chrome_options.add_argument('--homedir={}'.format('/tmp'))
    chrome_options.add_argument('--disk-cache-dir={}'.format('/tmp/cache-dir'))
        
    driver = webdriver.Chrome(
        executable_path='/usr/bin/chromedriver',
        options=chrome_options)

    if driver:
        print("Selenium Driver Initiated")
    
    response = {
        "statusCode": 200,
        "body": json.dumps(html, ensure_ascii=False)
    }

    return response

To be honest I still can't figure out why those modifications did the job. Any idea about that is more than welcomed!

Thank you all again for help and support

lbrb
  • 21
  • 4