10

I am new to airflow and trying to setup airflow to run ETL pipelines. I was able to install

  1. airflow
  2. postgres
  3. celery
  4. rabbitmq

I am able to test run the tutorial dag. When I try to schedule the jobs, scheduler is able to pick it up and queue the jobs which I could see on the UI but tasks are not running. Could somebody help me fix this issue?

Here is my config file:

[core]

airflow_home = /root/airflow

dags_folder = /root/airflow/dags

base_log_folder = /root/airflow/logs

executor = CeleryExecutor

sql_alchemy_conn = postgresql+psycopg2://xxxx.amazonaws.com:5432/airflow

api_client = airflow.api.client.local_client


[webserver]


web_server_host = 0.0.0.0

web_server_port = 8080

web_server_worker_timeout = 120

worker_refresh_batch_size = 1

worker_refresh_interval = 30

[celery]

celery_app_name = airflow.executors.celery_executor

celeryd_concurrency = 16

worker_log_server_port = 8793

broker_url = amqp://rabbit:rabbit@x.x.x.x/rabbitmq_vhost

celery_result_backend = db+postgresql+psycopg2://postgres:airflow@xxx.amazonaws.com:5432/airflow


flower_host = 0.0.0.0

flower_port = 5555

default_queue = default

DAG: This is the tutorial dag i used

and the start date for my dag is -- 'start_date': datetime(2017, 4, 11),

TylerH
  • 20,799
  • 66
  • 75
  • 101
Deepak S
  • 111
  • 1
  • 1
  • 3
  • Make sure your worker and scheduler share the same `celery_result_backend`, the same `dags_folder` and the same `broker_url` – jhnclvr Apr 20 '17 at 16:33
  • @René Hoffmann i made sure celery result backend, dag folder and broker_url the same for scheduler and worker and started airflow worker ... but it is still the same... jobs are coming to schedule but nothing ran – Deepak S Apr 20 '17 at 19:28

4 Answers4

13

have your run all the three components of airflow, namely:

airflow webserver
airflow scheduler
airflow worker

If you only run the previous two, the tasks will be queued, but not executed. airflow worker will provide the workers that actually execute the dags.

Also btw, celery 4.0.2 is not compatible with airflow 1.7 or 1.8 currently. Use celery 3 instead.

Xia Wang
  • 131
  • 3
  • 1
    @XiaWang - can you point to any supporting documentation or otherwise that confirms that celery 4.0.2 is not compatible? In the apache github repo setup.py, the celery version >=3.1.17 - which doesn't really tell us that it is or isn't supported. – Nick Jul 26 '17 at 14:09
  • @Nick Here's a relatively recent report on Apache's [issues webpage](https://issues.apache.org/jira/browse/AIRFLOW-630). I don't think the dependency issues with celerly 4 have been solved by now. – Xia Wang Jul 29 '17 at 19:39
  • 1
    @XiaWang - thanks for the link. Interesting, I do not see this issue, I wonder if there are any other issues. I am not getting any exceptions with celery 4.x. I think perhaps we might need to review our past dag run dependencies as to why things are not running right. – Nick Jul 30 '17 at 21:28
  • 2
    I'm also facing the same issue all jobs are in queuing but not executing. I'am using LocalExecuter(So no need to run "airflow worker"). Is there any way to resolve it? – MJK Nov 21 '17 at 18:26
0

I tried to upgrade to airflow v1.8 today as well and struggled with celery and rabbitmq. What helped was the change from librabbitmq (which is used by default when just using amqp) to pyamqp in airflow.cfg

broker_url = pyamqp://rabbit:rabbit@x.x.x.x/rabbitmq_vhost

(This is where i got the idea from: https://github.com/celery/celery/issues/3675)

Olaf
  • 1
0

I realise your problem is already answered and was related to a celery version mismatch, but I've also seen tasks queue and never run because I changed the logs location to a place where the airflow service user did not have permission to write.

In the example airflow.cfg given in the question above: base_log_folder = /root/airflow/logs

I am using AWS EC2 machine and changed the logs to write to base_log_folder = /mnt/airflow/logs

In the UI there is no indication given as to why tasks are queued, it just says "unknown, all dependencies are met ..." Giving the airflow daemon/service user permission to write fixed it.

Davos
  • 5,066
  • 42
  • 66
-3

If LocalExecutor is enough option for you, you can always try to get back to it. I've heard about some problems with CeleryExecutor.

Just change executor = CeleryExecutor to executor = LocalExecutor in your airflow.cfg file (most of the time ~/airflow/airflow.cfg).

Restart scheduler and that's it!

Jakub Bielan
  • 575
  • 1
  • 5
  • 14