I’m attempting to run Airflow in docker with docker-compose from inside another container. This container is created by default by an Azure Machine Learning compute cluster that I want to use to run my Airflow DAG.
The problem is that I have the following error when I try to execute docker-compose build
through the ScriptRunConfig class (see script below).
error creating aufs mount to /var/lib/docker/aufs/mnt/3d54c61133023e1d3700ce9746529dc8328e739346d4123d632951f11acdf122-init: mount target=/var/lib/docker/aufs/mnt/3d54c61133023e1d3700ce9746529dc8328e739346d4123d632951f11acdf122-init data=br:/var/lib/docker/aufs/diff/3d54c61133023e1d3700ce9746529dc8328e739346d4123d632951f11acdf122-init=rw:/var/lib/docker/aufs/diff/d6e643e28d8729f8972b78bedd2de1de602879b65ff56e07e76a236d3096b709=ro+wh:/var/lib/docker/aufs/diff/8d3dde63564e16508dbf64e7baae79a6b2913b3262cb6dba7027e1fdf8bb6f8f=ro+wh:/var/lib/docker/aufs/diff/fa2e9c9ac8ff7af5f5d35e4d2081fac4bb5139d760675e9608d2b8d04f096837=ro+wh:/var/lib/docker/aufs/diff/d34747b3a1646124aa7a7a0fed4790e1264667aa58fc4de2288a10e4b68673ce=ro+wh:/var/lib/docker/aufs/diff/ea46ddfc022eb268aa428376132bc52b00b317e747d63078a38d044bae5d48ec=ro+wh:/var/lib/docker/aufs/diff/b3c155d801b49d8252a3926493328d471c8f4cfd72c553771374e3952a999d95=ro+wh:/var/lib/docker/aufs/diff/a067d04104f7a70c6194a3e742be72fc221759b18628285e7fd16a2d678120f3=ro+wh:/var/lib/docker/aufs/diff/3db994298571c09ee3d40abf21f657f9c8650a6fe0ea2c6c6c7590eb7c6c712f=ro+wh:/var/lib/docker/aufs/diff/273fc331f9ebae1d0a01963d46bf9edca6d91a089d4791066cb170954eb7609c=ro+wh:/var/lib/docker/aufs/diff/419a894bbee2b9b8ec05deed26dcfc21f234276e06d765d03ed939b918d3908f=ro+wh:/var/lib/docker/aufs/diff/e91d472c4e53f2a2eae97aca59f7dcacdf57a4b22d64991c348528e4081500d6=ro+wh:/var/lib/docker/aufs/diff/c23cc64e903b5254c19446e8ddc6db0253cbd19e239860c1dc95440ca65aae94=ro+wh:/var/lib/docker/aufs/diff/5794fefdefed444bf17de20f6c3ecf93743cccef83169c94aba62ec902c8380f=ro+wh,dio,xino=/dev/shm/aufs.xino: invalid argument
What I tried so far:
- Discarded option : Using a
docker:dind
custom image where I installed curl and docker-compose. The problem is that it fails when building the image because:- it seems that Azure Machine Learning adds other steps to the Dockerfile to setup the conda environment needed for my project … but conda is not installed by default in the docker:dind image.
- The Official docker image is based on Alpine Linux but Azure Machine Learning is only compatible with systems specifications including Ubuntu, Conda … as per the documentation
- The option presented in this question: Using an image already available to Azure Machine Learning where I installed the Docker engine and docker-compose. The image is built successfully but I get the error shown above during the execution of my script by the compute cluster
Here's the Dockerfile for the custom Image (ubuntu os) used by the compute cluster to setup the environment. It is referred to as Dockerfile_cluster in the python script below.
FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20211029.v1
USER root
# install docker
RUN apt-get update -y \
&& apt-get install -y \
ca-certificates \
curl \
gnupg \
lsb-release \
&& mkdir -p /etc/apt/keyrings \
&& curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg \
&& echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null \
&& apt-get update -y \
&& apt-get install -y docker-ce docker-ce-cli containerd.io
# install docker-compose
RUN curl -L https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
RUN chmod +x /usr/local/bin/docker-compose
RUN ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose
And my docker-compose.yml file (Different dockerfile than the first one) :
services:
postgres:
image: postgres:13
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5434:5432"
init_db:
build:
context: .
dockerfile: Dockerfile
command: bash -c "airflow db init && airflow db upgrade"
env_file: .env
volumes:
- /var/run/docker.sock:/var/run/docker.sock
depends_on:
- postgres
scheduler:
build:
context: .
dockerfile: Dockerfile
restart: on-failure
command: bash -c "airflow scheduler"
env_file: .env
depends_on:
- postgres
ports:
- "8080:8793"
volumes:
- ./airflow_dags:/opt/airflow/dags
- ./data:/opt/airflow/data
- ./.git:/opt/airflow/.git
- ./conf:/opt/airflow/conf
- ./airflow_logs:/opt/airflow/logs
- /var/run/docker.sock:/var/run/docker.sock
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
webserver:
build:
context: .
dockerfile: Dockerfile
hostname: webserver
restart: always
env_file: .env
depends_on:
- postgres
command: bash -c "airflow users create -r Admin -u admin -e admin@example.com -f admin -l user -p admin && airflow webserver"
volumes:
- ./airflow_dags:/opt/airflow/dags
- ./data:/opt/airflow/data
- ./.git:/opt/airflow/.git
- ./conf:/opt/airflow/conf
- ./airflow_logs:/opt/airflow/logs
- /var/run/docker.sock:/var/run/docker.sock
ports:
- "5000:8080"
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 32
And finally the script I use to submit my experiment to the compute cluster.
from azureml.core import Workspace, Experiment, Environment, ScriptRunConfig
from azureml.core.authentication import ServicePrincipalAuthentication
# Create an environment from conda reqs
env = Environment.from_conda_specification(name = "env_name", file_path = "./src/conda.yml")
# Use custom image from Dockerfile
env.docker.base_image = None
env.docker.base_dockerfile = "./Dockerfile_cluster"
# Instantiate ServicePrincipalAuth object
svc_pr = ServicePrincipalAuthentication(tenant_id=tenant_id,
service_principal_id=sp_id,
service_principal_password=sp_pwd)
# Instantiate AML Workspace object
ws = Workspace(
subscription_id=sub_id,
resource_group=rg_name,
workspace_name=ws_name,
auth=svc_pr
)
command ="bash -c 'service docker start && docker-compose build --no-cache'".split()
experiment = Experiment(workspace=ws, name='exp-test')
config = ScriptRunConfig(source_directory='.', command = command, compute_target='cpt-cluster', environment=env)
# Submit experiment
run = experiment.submit(config)
aml_url = run.get_portal_url()
print(aml_url)
run.wait_for_completion(show_output=True)
In the script above, I only tried to build first before running the services.
Normally, the commands I use to run my Airflow DAG locally or using a compute instance are:
docker-compose build –no-cache
docker-compose up postgres
In another terminal:
docker-compose up init_db
docker-compose up scheduler webserver
Thank you very much for your help.