0

I have a selenium parser that needs to be run in docker. When run on a local machine, the script works completely correctly. When running inside a container, it feels like selenium is not working, when searching any elements, I get an error that the element will not be found. Thus, I conclude that selenium does not run inside the docker, or it cannot integrate with the chrome browser. I tried installing chrome browser, chrome driver inside container. Tried using a remote driver running inside another container. The result is always the same. The highest priority is to run without using a remote driver. Looking forward to your advice, thanks everyone!

My Dockerfile:

FROM python:3.10-slim-buster

RUN mkdir -p /usr/src/app/
WORKDIR /usr/src/app/

COPY . /usr/src/app/

RUN pip install --no-cache-dir -r requirements.txt

RUN  apt-get update \
  && apt-get install -y wget \
  && apt-get install -y gnupg2 \
  && apt-get install -y curl \
  && rm -rf /var/lib/apt/lists/*

# install google chrome
RUN wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
RUN sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list'
RUN apt-get -y update
RUN apt-get install -y google-chrome-stable

# install chromedriver
RUN apt-get install -yqq unzip
RUN wget -O /tmp/chromedriver.zip http://chromedriver.storage.googleapis.com/`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE`/chromedriver_linux64.zip
RUN unzip /tmp/chromedriver.zip chromedriver -d /usr/src/app/chromedriver/


CMD python3 ./script.py

My Python script:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service

service = Service(r'chromedriver/chromedriver')
options = webdriver.ChromeOptions()

options.add_argument('--no-sandbox')

# webdriver mode
options.add_argument('--disable-blink-features=AutomationControlled')
# user-agent
options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
                     "Chrome/103.0.0.0 Safari/537.36")

options.add_argument('--window-size=1420,1080')
# headless mode
options.add_argument('--headless')
options.add_argument('--disable-gpu')
# incognito mode
options.add_argument("--incognito")

options.add_experimental_option("excludeSwitches", ["enable-logging"])

driver = webdriver.Chrome(
    service=service,
    options=options
)

Eldellano
  • 5
  • 2
  • 1
    Could you provide your `requirements.txt` as well? – Martin Tovmassian Jul 28 '22 at 11:31
  • Why don't you just use a docker image provided by selenium? It contains all the features neccessary to run your tests. https://hub.docker.com/u/selenium – Tork Jul 28 '22 at 11:35
  • @MartinTovmassian requirements.txt --> selenium==4.3.0 beautifulsoup4==4.11.1 lxml==4.9.1 requests==2.28.1 uvicorn==0.18.2 fastapi==0.79.0 celery==5.2.7 flower==1.1.0 fake_user_agent==0.0.15 chromedriver-binary==103.0.5060.134.0 – Eldellano Jul 28 '22 at 12:17
  • @Tork When I use selenium/standalone-chrome I get the following - urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=4444): Max retries exceeded with url: /session (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused')) – Eldellano Jul 28 '22 at 12:20
  • @Tork In the script I define the driver like this driver = webdriver.Remote("http://127.0.0.1:4444", options=options) – Eldellano Jul 28 '22 at 12:23

2 Answers2

1

Don't know your use case with Selenium and the actual error you get, but based on your Dockerfile and your Python script I tried to run the Selenium Getting Started example.

I have just added these two lines to your script:

driver.get("http://www.python.org")
print("Python" in driver.title)

In the first run I faced this error:

Traceback (most recent call last):
  File "/usr/src/app/script.py", line 29, in <module>
    driver.get("http://www.python.org")
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 447, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.10/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from tab crashed
  (Session info: headless chrome=103.0.5060.134)
Stacktrace:

So based on this answer I fixed the issue by declaring this argument: options.add_argument("--disable-dev-shm-usage")

And then the script worked as expected.

Martin Tovmassian
  • 1,010
  • 1
  • 10
  • 19
0

Docker run --shm-size=1gb image_name

or

In chromeoptions in the main.py file or what is your file you can disable shm-usage

Google it.

Basically it fails due to small size of shm when running selenium with chrome.

XouDo
  • 945
  • 10
  • 19
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 09 '23 at 22:05