0

I'm trying to set up docker with airflow. Every time I run airflow in docker I need to initialize the scheduler, for this, I have to get into the airflow container terminal and run airflow scheduler every time, so I want to automatize this.

I have seen that it's common to use three containers for Airflow: Initialization, webserver, and scheduler. I tried to do this but it doesn't work because the dependency on the docker compose file only assures that one container starts after other, it doesn't wait for the first container to finish, hence my problem.

This is the docker compose file with airflow as three services:

version: '3.8'
services:
  postgres:
    image: postgres:15.3
    environment:
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: admin
      POSTGRES_DB: postgres_db
    volumes:
      - ./postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"
    networks:
      - mynetwork
      
  airflow-init:
    build: ./airflow
    command: >
      bash -c "
        airflow db init && 
        airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@example.com
      "
    volumes:
      - ./airflow/dags:/opt/airflow/dags
      - ./csv data:/opt/csv_data
    depends_on:
      - postgres
    networks:
      - mynetwork
      
  airflow-webserver:
    build: ./airflow
    command: airflow webserver
    volumes:
      - ./airflow/dags:/opt/airflow/dags
      - ./csv data:/opt/csv_data
    ports:
      - 8080:8080
    depends_on:
      - airflow-init
      - postgres
    networks:
      - mynetwork

  airflow-scheduler:
    build: ./airflow
    command: airflow scheduler
    volumes:
      - ./airflow/dags:/opt/airflow/dags
      - ./csv data:/opt/csv_data
    depends_on:
      - airflow-init
      - postgres
    networks:
      - mynetwork

networks:
  mynetwork:
    driver: bridge

This is the docker compose file with airflow as one (the one that works, but I have to go to the airflow container terminal to execute airflow scheduler):

version: '3.8'
services:
  postgres:
    image: postgres:15.3
    environment:
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: admin
      POSTGRES_DB: postgres_db
    volumes:
      - ./postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"
    networks:
      - mynetwork
         
  airflow:
    build: ./airflow
    command: >
      bash -c "
        airflow db init && 
        airflow users create --username admin --password admin --firstname Admin --lastname User --role Admin --email admin@example.com &&
        airflow webserver &&
        airflow scheduler
      "
    volumes:
      - ./airflow/dags:/opt/airflow/dags
      - ./csv data:/opt/csv_data
    ports:
      - 8080:8080
    depends_on:
      - postgres
    networks:
      - mynetwork

networks:
  mynetwork:
    driver: bridge

The airflow scheduler logs:

2023-07-02 19:34:42 ERROR: You need to initialize the database. Please run `airflow db init`. Make sure the command is run using Airflow version 2.6.2.

The airflow webserver logs:

2023-07-02 19:34:40 ERROR: You need to initialize the database. Please run `airflow db init`. Make sure the command is run using Airflow version 2.6.2.

So, how can I fix this? Specifically, how can I run airflow without having to type airflow scheduler in the terminal container each time?

As a piece of additional information, the Admin user I created in airflow is because the default one (airflow:airflow) doesn't work. I've tried to figure out why, but after a while cracking my head with that, I just decided to create a new user called Admin.

EDIT:

I have been trying to use this basic Airflow cluster configuration for CeleryExecutor with Redis and PostgreSQL script as input, but without success.

Chris
  • 2,019
  • 5
  • 22
  • 67
  • Do you realise the 3 separate services are just that, **separate**? Running `airflow db init` on the `airflow-init` service has no effect at all on the other 2. It's also unclear what you perceive the issue to be with the single service. You say you have to execute `airflow scheduler` manually but it is already part of your `command`. If that's not working, I think you should focus on why – Phil Jul 02 '23 at 23:03
  • It's not working because both the airflow webserver and airflow scheduler commands are long-running processes. When I put them in the same bash command string, the airflow webserver command starts and never returns control back to bash to execute the next command, which is airflow scheduler. Hence, the problem. – Chris Jul 03 '23 at 02:22
  • Ah, gotcha. Does this help? [Docker multiple entrypoints](https://stackoverflow.com/q/18805073/283366). Alternately, you could simply try running multiple background processes by suffixing the commands with `&`. [How i can run sh script inside docker-compose file in background?](https://stackoverflow.com/q/67128203/283366) – Phil Jul 03 '23 at 02:26

0 Answers0