I have been trying for over a week to dockerize a Flask application with the MongoDB database. My application has the following structure:
aplication
- db
Dockerfile
- web
app.py
Dockerfile
requirements.txt
docker-compose.yml
This application makes web scraping using the selenium, selenium-requests and BeautifulSoup libraries. The problem I've been getting for a week is:
selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
(unknown error: DevToolsActivePort file doesn't exist)
(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
I've been trying several different ways to make the application's Dockerfile and I can't find one that really works.
My code is as follows:
- app.py
from bs4 import BeautifulSoup
from seleniumrequests import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from flask import Flask, request, jsonify
from flask_restful import API, Resource
from pymongo import MongoClient
app = Flask(__name__)
api = api(app)
client = MongoClient("mongodb://db:27017")
db = client.FlaskAPI
users = db["users"]
chrome_options = Options()
chrome_options.add_argument("--headless")
webdriver = Chrome(executable_path='/usr/local/bin/chromedriver', options=chrome_options)
webdriver.get(self.url+'login')
webdriver.find_element_by_id("userName-id").send_keys(username)
webdriver.find_element_by_id("passWd-id").send_keys(password)
webdriver.find_element(By.XPATH, '//input[@value="Login"]').click()
soup = BeautifulSoup(self.webdriver.page_source, 'lxml')
data = ''
p = re.compile('var userParam=(.*);')
for script in soup.find_all("script", {"src":False}):
if p.search(script.string):
data = json.dumps(str(script.string))
data = remove_formatation_elements(data)
users.insert({"data": data})
- db Dockerfile
FROM mongo:5.0.3
- web Dockerfile
FROM python:3.8
WORKDIR /usr/src/app
# Install Chrome WebDriver
RUN CHROMEDRIVER_VERSION=`curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE` && \
mkdir -p /opt/chromedriver-$CHROMEDRIVER_VERSION && \
curl -sS -o /tmp/chromedriver_linux64.zip http://chromedriver.storage.googleapis.com/$CHROMEDRIVER_VERSION/chromedriver_linux64.zip && \
unzip -qq /tmp/chromedriver_linux64.zip -d /opt/chromedriver-$CHROMEDRIVER_VERSION && \
rm /tmp/chromedriver_linux64.zip && \
chmod +x /opt/chromedriver-$CHROMEDRIVER_VERSION/chromedriver && \
ln -fs /opt/chromedriver-$CHROMEDRIVER_VERSION/chromedriver /usr/local/bin/chromedriver
# Install Google Chrome
RUN curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - && \
echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list && \
apt-get -yqq update && \
apt-get -yqq install google-chrome-stable && \
rm -rf /var/lib/apt/lists/*
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "app.py"]
- requirements.txt
flask
flask-restful
pymongo
bs4
selenium
selenium-requests
lxml
webdriver-manager
and
- docker-compose.yml
version: '3.9'
services:
web:
build: './web'
ports:
- '5000:5000'
links:
- db
db:
build: './db'
With the web folder dockerfile the chromedriver is created in the /usr/local/bin folder and the google-chrome and google-chrome-stable files in the /usr/bin folder.
So I can't understand why, even after so many different attempts, I can't get this code to work... Could someone help me please?