0

I’m attempting to run Airflow in docker with docker-compose from inside another container. This container is created by default by an Azure Machine Learning compute cluster that I want to use to run my Airflow DAG.

The problem is that I have the following error when I try to execute docker-compose build through the ScriptRunConfig class (see script below).

error creating aufs mount to /var/lib/docker/aufs/mnt/3d54c61133023e1d3700ce9746529dc8328e739346d4123d632951f11acdf122-init: mount target=/var/lib/docker/aufs/mnt/3d54c61133023e1d3700ce9746529dc8328e739346d4123d632951f11acdf122-init data=br:/var/lib/docker/aufs/diff/3d54c61133023e1d3700ce9746529dc8328e739346d4123d632951f11acdf122-init=rw:/var/lib/docker/aufs/diff/d6e643e28d8729f8972b78bedd2de1de602879b65ff56e07e76a236d3096b709=ro+wh:/var/lib/docker/aufs/diff/8d3dde63564e16508dbf64e7baae79a6b2913b3262cb6dba7027e1fdf8bb6f8f=ro+wh:/var/lib/docker/aufs/diff/fa2e9c9ac8ff7af5f5d35e4d2081fac4bb5139d760675e9608d2b8d04f096837=ro+wh:/var/lib/docker/aufs/diff/d34747b3a1646124aa7a7a0fed4790e1264667aa58fc4de2288a10e4b68673ce=ro+wh:/var/lib/docker/aufs/diff/ea46ddfc022eb268aa428376132bc52b00b317e747d63078a38d044bae5d48ec=ro+wh:/var/lib/docker/aufs/diff/b3c155d801b49d8252a3926493328d471c8f4cfd72c553771374e3952a999d95=ro+wh:/var/lib/docker/aufs/diff/a067d04104f7a70c6194a3e742be72fc221759b18628285e7fd16a2d678120f3=ro+wh:/var/lib/docker/aufs/diff/3db994298571c09ee3d40abf21f657f9c8650a6fe0ea2c6c6c7590eb7c6c712f=ro+wh:/var/lib/docker/aufs/diff/273fc331f9ebae1d0a01963d46bf9edca6d91a089d4791066cb170954eb7609c=ro+wh:/var/lib/docker/aufs/diff/419a894bbee2b9b8ec05deed26dcfc21f234276e06d765d03ed939b918d3908f=ro+wh:/var/lib/docker/aufs/diff/e91d472c4e53f2a2eae97aca59f7dcacdf57a4b22d64991c348528e4081500d6=ro+wh:/var/lib/docker/aufs/diff/c23cc64e903b5254c19446e8ddc6db0253cbd19e239860c1dc95440ca65aae94=ro+wh:/var/lib/docker/aufs/diff/5794fefdefed444bf17de20f6c3ecf93743cccef83169c94aba62ec902c8380f=ro+wh,dio,xino=/dev/shm/aufs.xino: invalid argument

What I tried so far:

  1. Discarded option : Using a docker:dind custom image where I installed curl and docker-compose. The problem is that it fails when building the image because:
    • it seems that Azure Machine Learning adds other steps to the Dockerfile to setup the conda environment needed for my project … but conda is not installed by default in the docker:dind image.
    • The Official docker image is based on Alpine Linux but Azure Machine Learning is only compatible with systems specifications including Ubuntu, Conda … as per the documentation
  2. The option presented in this question: Using an image already available to Azure Machine Learning where I installed the Docker engine and docker-compose. The image is built successfully but I get the error shown above during the execution of my script by the compute cluster

Here's the Dockerfile for the custom Image (ubuntu os) used by the compute cluster to setup the environment. It is referred to as Dockerfile_cluster in the python script below.

FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20211029.v1

USER root

# install docker
RUN apt-get update -y \
 && apt-get install -y \
    ca-certificates \
    curl \
    gnupg \
    lsb-release \
 && mkdir -p /etc/apt/keyrings \
 && curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg \
 && echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null \
 && apt-get update -y \
 && apt-get install -y docker-ce docker-ce-cli containerd.io

# install docker-compose
RUN curl -L https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m) -o /usr/local/bin/docker-compose
RUN chmod +x /usr/local/bin/docker-compose
RUN ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

And my docker-compose.yml file (Different dockerfile than the first one) :

services:
  postgres:
    image: postgres:13
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
    ports:
      - "5434:5432"
  init_db:
    build:
      context: .
      dockerfile: Dockerfile
    command: bash -c "airflow db init && airflow db upgrade"
    env_file: .env
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock  
    depends_on:
      - postgres
  scheduler:
    build:
      context: .
      dockerfile: Dockerfile
    restart: on-failure
    command:  bash -c "airflow scheduler"
    env_file: .env
    depends_on:
      - postgres
    ports:
      - "8080:8793"
    volumes:
      - ./airflow_dags:/opt/airflow/dags
      - ./data:/opt/airflow/data
      - ./.git:/opt/airflow/.git
      - ./conf:/opt/airflow/conf
      - ./airflow_logs:/opt/airflow/logs
      - /var/run/docker.sock:/var/run/docker.sock
    healthcheck:
      test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 3
  webserver:
    build:
      context: .
      dockerfile: Dockerfile
    hostname: webserver
    restart: always
    env_file: .env
    depends_on:
      - postgres
    command: bash -c "airflow users create -r Admin -u admin -e admin@example.com -f admin -l user -p admin && airflow webserver"
    volumes:
      - ./airflow_dags:/opt/airflow/dags
      - ./data:/opt/airflow/data
      - ./.git:/opt/airflow/.git
      - ./conf:/opt/airflow/conf
      - ./airflow_logs:/opt/airflow/logs
      - /var/run/docker.sock:/var/run/docker.sock
    ports:
      - "5000:8080"
    healthcheck:
      test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 32

And finally the script I use to submit my experiment to the compute cluster.

from azureml.core import Workspace, Experiment, Environment, ScriptRunConfig
from azureml.core.authentication import ServicePrincipalAuthentication

# Create an environment from conda reqs
env = Environment.from_conda_specification(name = "env_name", file_path = "./src/conda.yml")              

# Use custom image from Dockerfile
env.docker.base_image = None
env.docker.base_dockerfile = "./Dockerfile_cluster"


# Instantiate ServicePrincipalAuth object
svc_pr = ServicePrincipalAuthentication(tenant_id=tenant_id,
       service_principal_id=sp_id,
       service_principal_password=sp_pwd)

# Instantiate AML Workspace object
ws = Workspace(
       subscription_id=sub_id,
       resource_group=rg_name,
       workspace_name=ws_name,
       auth=svc_pr
       )

command ="bash -c 'service docker start && docker-compose build --no-cache'".split()
experiment = Experiment(workspace=ws, name='exp-test')

config = ScriptRunConfig(source_directory='.', command = command, compute_target='cpt-cluster', environment=env)
# Submit experiment

run = experiment.submit(config)
aml_url = run.get_portal_url()
print(aml_url)
run.wait_for_completion(show_output=True)

In the script above, I only tried to build first before running the services.

Normally, the commands I use to run my Airflow DAG locally or using a compute instance are:

  1. docker-compose build –no-cache
  2. docker-compose up postgres

In another terminal:

  1. docker-compose up init_db
  2. docker-compose up scheduler webserver

Thank you very much for your help.

Downforu
  • 317
  • 5
  • 13
  • What do you get as the output of the storage driver running `docker info` command with `ScriptRunConfig`? Try make the storage driver used by the docker daemon running in the container the same as that for the daemon running in Compute. – Oluwafemi Sule May 25 '22 at 16:19
  • Here's what I get for the storage driver used by the daemon running in compute : `Storage Driver: overlay2` And for the storage driver in the container : `Storage Driver: aufs`. Which one should I change and how ? Thanks – Downforu May 25 '22 at 21:12
  • Change the one for the docker daemon running in the container to overlay2. https://docs.docker.com/storage/storagedriver/overlayfs-driver/#configure-docker-with-the-overlay-or-overlay2-storage-driver – Oluwafemi Sule May 26 '22 at 08:26
  • Unfortunately it didn't work. The docker engine simply does not restart after I create the daemon.json file with {"storage-driver": "overlay2"}. Following the [docs](https://docs.docker.com/storage/storagedriver/select-storage-driver/#supported-backing-filesystems), it seems that only `xfs` and `ext4` are supported for `overlay2` storage driver. There are also two SO questions about that : [here](https://stackoverflow.com/questions/67953609/overlay2-driver-not-supported) and [here](https://stackoverflow.com/questions/70631927/using-overlay2-storage-driver-with-an-overlay-filesystem). – Downforu May 26 '22 at 19:54
  • I don't know if it is relevant, but docker inside the docker container does not start automatically on boot. So, I need to execute `service docker start` before I can do `docker info` etc.... – Downforu May 26 '22 at 19:58

0 Answers0