1

I have a big DAG with around 400 tasks that starts at 8:00 and runs for about 2.5 hours.

There are some smaller DAGs that need to start at 9:00, they are scheduled but are not able to start until the first DAG finishes.

I reduced concurrency=6. The DAG is running only 6 parallel tasks, however this is not solving the issue that the other tasks in other DAGs don't start.

There is no other global configuration to limit the number of running tasks, other smaller dags usually run in parallel.

What can be the issue here?

Ariflow version: 2.1 with Local Executor with Postgres backend running on a 20core server.

Tasks of active DAGs not starting

Drilon
  • 11
  • 1

2 Answers2

0

I don't think it's related to concurrency. This could be related to Airflow using the mini-scheduler.

When a task is finished Task supervisor process perform a "mini scheduler" attempting to schedule more tasks of the same DAG. This means that the DAG will be finished quicker as the downstream tasks are set to Scheduled mode directly however one of it's side effect that it can cause starvation for other DAGs in some circumstances. A case like you present where you have one very big DAG that takes very long time to complete and starts before smaller DAGs may be the exact case where stravation can happen.

Try to set schedule_after_task_execution = False in airflow.cfg and it should solve your issue.

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
0

Why don't you use the option to invoke the task after the previous one is finished? In the first DAG, insert the call to the next one as follows:

    trigger_new_dag = TriggerDagRunOperator(
    task_id=[task name],
    trigger_dag_id=[trigered dag],
    dag=dag
    )

This operator will start a new DAG after the previous one is executed.

Documentation: https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/operators/trigger_dagrun/index.html

GuziQ
  • 111
  • 6