0

I'm currently running docker-compose to run Airflow DAGs that are stored in local volumes. How can I use a Github repository as volume for DAGs instead? How can I set up the connection? Is it possible to use a Github repo as volume at all?

My current settings in docker-compose.yaml

  &airflow-common
  build: .
  env_file:
    - ./config/development.env
  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: LocalExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    # For backward compatibility, with Airflow <2.3
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CORE__FERNET_KEY: ''
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
    _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
    - ./includes:/opt/airflow/includes
  user: "${AIRFLOW_UID:-50000}:0"
  depends_on:
    &airflow-common-depends-on
    postgres:
      condition: service_healthy
  • I have not tested this but you might be able to use [git clone as a command](https://gist.github.com/573/ae3b2f912116d141d74bd32dac5cda81) in the docker compose yaml. Personally I just pull the code into the local volume that then gets mounted for local dev. For production when using K8s there is the option of a [git-sync sidecar](https://airflow.apache.org/docs/helm-chart/stable/manage-dags-files.html#). – TJaniF Jan 17 '23 at 14:57

1 Answers1

0

Airflow provides a sidecar container in the official helm chart to sync the dag files with git, this container is running in the same pod of the scheduler and the workers pods to download periodically the dag files from git repo.

While you are using docker compose, you can do a similar thing by running a git-sync container on your stack and create a shared volume between the scheduler and the workers containers and the git-sync container.

Here is an example of a shared volume between the containers, and for the git-sync container, you can use the image k8s.gcr.io/git-sync/git-sync:v3.4.0, and for its configurations, and here is the github repo for the project, you can check the docs to configure the container.

Hussein Awala
  • 4,285
  • 2
  • 9
  • 23