I'm trying to set up docker with airflow. Every time I run airflow in docker I need to initialize the scheduler, for this, I have to get into the airflow container terminal and run airflow scheduler
every time, so I want to automatize this.
I have seen that it's common to use three containers for Airflow: Initialization, webserver, and scheduler. I tried to do this but it doesn't work because the dependency on the docker compose file only assures that one container starts after other, it doesn't wait for the first container to finish, hence my problem.
This is the docker compose file with airflow as three services:
version: '3.8'
services:
postgres:
image: postgres:15.3
environment:
POSTGRES_USER: admin
POSTGRES_PASSWORD: admin
POSTGRES_DB: postgres_db
volumes:
- ./postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
networks:
- mynetwork
airflow-init:
build: ./airflow
command: >
bash -c "
airflow db init &&
airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@example.com
"
volumes:
- ./airflow/dags:/opt/airflow/dags
- ./csv data:/opt/csv_data
depends_on:
- postgres
networks:
- mynetwork
airflow-webserver:
build: ./airflow
command: airflow webserver
volumes:
- ./airflow/dags:/opt/airflow/dags
- ./csv data:/opt/csv_data
ports:
- 8080:8080
depends_on:
- airflow-init
- postgres
networks:
- mynetwork
airflow-scheduler:
build: ./airflow
command: airflow scheduler
volumes:
- ./airflow/dags:/opt/airflow/dags
- ./csv data:/opt/csv_data
depends_on:
- airflow-init
- postgres
networks:
- mynetwork
networks:
mynetwork:
driver: bridge
This is the docker compose file with airflow as one (the one that works, but I have to go to the airflow container terminal to execute airflow scheduler
):
version: '3.8'
services:
postgres:
image: postgres:15.3
environment:
POSTGRES_USER: admin
POSTGRES_PASSWORD: admin
POSTGRES_DB: postgres_db
volumes:
- ./postgres_data:/var/lib/postgresql/data
ports:
- "5432:5432"
networks:
- mynetwork
airflow:
build: ./airflow
command: >
bash -c "
airflow db init &&
airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@example.com &&
airflow webserver &&
airflow scheduler
"
volumes:
- ./airflow/dags:/opt/airflow/dags
- ./csv data:/opt/csv_data
ports:
- 8080:8080
depends_on:
- postgres
networks:
- mynetwork
networks:
mynetwork:
driver: bridge
The airflow scheduler logs:
2023-07-02 19:34:42 ERROR: You need to initialize the database. Please run `airflow db init`. Make sure the command is run using Airflow version 2.6.2.
The airflow webserver logs:
2023-07-02 19:34:40 ERROR: You need to initialize the database. Please run `airflow db init`. Make sure the command is run using Airflow version 2.6.2.
So, how can I fix this? Specifically, how can I run airflow without having to type airflow scheduler
in the terminal container each time?
As a piece of additional information, the Admin
user I created in airflow is because the default one (airflow:airflow
) doesn't work. I've tried to figure out why, but after a while cracking my head with that, I just decided to create a new user called Admin
.
EDIT:
I have been trying to use this basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL script as input, but without success.