Iām setting up a distributed Airflow cluster where everything else except the celery workers are run on one host and processing is done on several hosts. The airflow2.0 setup is configured using the yaml file given at the Airflow documentation https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml . In my initial tests I got the architecture to work nicely when I run everything at the same host. The question is, how to start the celery workers at the remote hosts?
So far, I tried to create a trimmed version of the above docker-compose where I only start the celery workers at the worker host and nothing else. But I run into some issues with db connection. In the trimmed version I changed the URL so that they point to the host that runs the db and redis.
dags, logs, plugins and the postgresql db are located on a shared drive that is visible to all hosts.
How should I do the configuration? Any ideas what to check? Connections etc.? Celery worker docker-compose configuration:
---
version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.1.0}
environment:
&airflow-common-env
AIRFLOW_UID: 50000
AIRFLOW_GID: 50000
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN:
postgresql+psycopg2://airflow:airflow@airflowhost.example.com:8080/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@airflow@airflowhost.example.com:8080/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@airflow@airflowhost.example.com:6380/0
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
REDIS_PORT: 6380
volumes:
- /airflow/dev/dags:/opt/airflow/dags
- /airflow/dev/logs:/opt/airflow/logs
- /airflow/dev/plugins:/opt/airflow/plugins
user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
services:
airflow-remote-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
restart: always
EDIT 1: I'm Still having some difficulties with the log files. It appears that sharing the log directory doesn't solve the issue of missing log files. I added the extra_host definition on main like suggested and opened the port 8793 on the worker machine. The worker tasks fail with log:
*** Log file does not exist:
/opt/airflow/logs/tutorial/print_date/2021-07-
01T13:57:11.087882+00:00/1.log
*** Fetching from: http://:8793/log/tutorial/print_date/2021-07-01T13:57:11.087882+00:00/1.log
*** Failed to fetch log file from worker. Unsupported URL protocol ''