0
FROM apache/airflow:2.2.4

# install mongodb-org-tools - mongodb tools for up-to-date mongodb that can handle --uri=mongodb+srv: flag
RUN apt-get update && apt-get install -y gnupg software-properties-common && \
    curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
    add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
    apt-get update && \
    apt-get install -y mongodb-org-tools

ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt

We need to be able to use mongoDB CLI commands such as mongoimport, mongoexport in BashOperator in our airflow project, as our workflow involves moving data into a MongoDB database. We have a strong preference for using mongo commands like mongoimport over the python pymongo package.

When we build the image, it seems we do not have permission to install mongo - we receive the following error:

=> ERROR [cbb-airflow_airflow-webserver 2/4] RUN apt-get update && apt-get install -y gnupg software-properties-common &&     curl -fsSL https://www.  0.6s
------
 > [cbb-airflow_airflow-webserver 2/4] RUN apt-get update && apt-get install -y gnupg software-properties-common &&     curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - &&     add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' &&     apt-get update &&     apt-get install -y mongodb-org-tools:
#0 0.460 Reading package lists...
#0 0.592 E: List directory /var/lib/apt/lists/partial is missing. - Acquire (13: Permission denied)
------
failed to solve: executor failed running [/bin/bash -o pipefail -o errexit -o nounset -o nolog -c apt-get update && apt-get install -y gnupg software-properties-common &&     curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - &&     add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' &&     apt-get update &&     apt-get install -y mongodb-org-tools]: exit code: 100

What is the best way to install mongo CLI for commands like mongoimport using the official apache/airflow docker image?

Canovice
  • 9,012
  • 22
  • 93
  • 211

1 Answers1

2

Add USER root after the FROM statement.

Updated Dockerfile will look like this:

FROM apache/airflow:2.2.4

USER root

# install mongodb-org-tools - mongodb tools for up-to-date mongodb that can handle --uri=mongodb+srv: flag
RUN apt-get update && apt-get install -y gnupg software-properties-common && \
    curl -fsSL https://www.mongodb.org/static/pgp/server-4.2.asc | apt-key add - && \
    add-apt-repository 'deb https://repo.mongodb.org/apt/debian buster/mongodb-org/4.2 main' && \
    apt-get update && \
    apt-get install -y mongodb-org-tools

ADD requirements.txt /requirements.txt
RUN pip install -r /requirements.txt

TL;DR

The user is set to airflow (id 5000) in the apache/airflow:2.2.4 Docker image. We can confirm this by looking at the 49th instruction in the Dockerfile here.

Now when you try to run any command, it will run using the airflow user which has restricted access.

To overcome this problem, you need to explicitly switch to the root user while building the Docker image. This will resolve all the permission-related issues.

Kapil Khandelwal
  • 1,096
  • 12
  • 19
  • all makes sense, confirming this worked on my end! thank you – Canovice Jul 28 '22 at 13:41
  • Do I need EXPOSE 27017 ENTRYPOINT ["/usr/bin/mongod", "--bind_ip_all"] at the end? When I read some other topics on the internet they always add EXPOSE 27017 – Quang Hoàng Minh Nov 17 '22 at 01:49
  • @QuangHoàngMinh It depends upon your use case. if you wish to access the MongoDB from other docker containers, then you can consider to add the EXPOSE command. Refer this for more details: https://stackoverflow.com/a/22150099/8405123 ENTRYPOINT instruction is to start the mongodb process. Ideally, you can consider adding it at the end of the Dockerfile. – Kapil Khandelwal Nov 17 '22 at 18:26
  • @KapilKhandelwal Do you remember how to start the mongod service in the airflow-webserver container? I found that the "mongodb-org-tools/buster, now 4.2.23 amd64 [installed]" is installed but can't start the mongod service no matter what. I have tried many ways on the internet but no success. It's always said that monod service is not found. When I went to the /usr/bin I only find mongoimport, mongotop,... but not mongod – Quang Hoàng Minh Nov 19 '22 at 07:45