5

I created a dag and scheduled it on a daily basis. It gets queued every day but tasks don't actually run. This problem already raised in the past here but the answers didn't help me so it seems there is another problem.

My code is shared below. I replaced the SQL of task t2 with a comment. Each one of the tasks runs successfully when I run them separately on CLI using "airflow test...".

Can you explain what should be done to make the DAG run? Thanks!

This is the DAG code:

from datetime import timedelta, datetime
from airflow import DAG
from airflow.contrib.operators.bigquery_operator import BigQueryOperator



default_args = {
    'owner' : 'me',
    'depends_on_past' : 'true',
    'start_date' : datetime(2018, 06, 25),
    'email' : ['myemail@moovit.com'],
    'email_on_failure':True,
    'email_on_retry':False,
    'retries' : 2,
    'retry_delay' : timedelta(minutes=5)
}


dag = DAG('my_agg_table',
default_args = default_args,
schedule_interval = "30 4 * * *"
)



t1 = BigQueryOperator(
    task_id='bq_delete_my_agg_table',
    use_legacy_sql=False,
    write_disposition='WRITE_TRUNCATE',
    allow_large_results=True,
    bql='''
    delete `my_project.agg.my_agg_table`
    where date = '{{ macros.ds_add(ds, -1)}}'
    ''',
    dag=dag)

t2 = BigQueryOperator(
    task_id='bq_insert_my_agg_table',
    use_legacy_sql=False,
    write_disposition='WRITE_APPEND',
    allow_large_results=True,
    bql='''
    #standardSQL
    Select ... the query continue here.....
    ''',    destination_dataset_table='my_project.agg.my_agg_table',
    dag=dag)


t1 >> t2
dsesto
  • 7,864
  • 2
  • 33
  • 50
Saar Porat
  • 49
  • 1
  • 1
  • 5

1 Answers1

14

It is usually very easy to find out about the reason why a task is not being run. When in the Airflow web UI:

  • select any DAG of interest
  • now click on the task
  • again, click on Task Instance Details
  • In the first row there is a panel Task Instance State
  • In the box Reason next to it is the reason why a task is being run - or why a task is being ignored

It usually makes sense to check the first task which is not being executed since I saw you have setup depends_on_past=True which can lead to problems if used in a wrong scenario.

More on that here: Airflow 1.9.0 is queuing but not launching tasks

tobi6
  • 8,033
  • 6
  • 26
  • 41
  • 1
    Thanks, tobi6!. Your instructions are very helpful and exposed a possible reason. Problem wasn't solved yet because DAG is still stuck but maybe this is for another question. In the task instance details depends_on_past is now false but an error message saying "depends_on_past is true for this task's DAG, but the previous task instance has not run yet". – Saar Porat Jul 12 '18 at 08:27
  • 1
    I accepted the answer, thanks. yes, I restarted both the scheduler and the webserver before i added my comment above. – Saar Porat Jul 12 '18 at 09:24
  • Strange. You might have to rename the DAG, ex. adding _v1: my_agg_table_v1 and then check again. – tobi6 Jul 12 '18 at 09:29
  • 1
    Thanks. Rename worked. at least in the initial run. Hope it will keep running without problems on the daily scheduled run. – Saar Porat Jul 12 '18 at 10:25