8

I want to build something where I need to capture all of the leaf tasks and add a downstream dependency to them to make a job complete in our database. Is there an easy way to find all the leaf nodes of a DAG in Airflow?

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131
Ace Haidrey
  • 1,198
  • 2
  • 14
  • 27
  • The only way I know of right now is by checking the downstream_list and making sure it's empty. Is there a better way – Ace Haidrey Apr 21 '17 at 18:54

1 Answers1

4

Use upstream_task_ids and downstream_task_ids @property from BaseOperator

def get_start_tasks(dag: DAG) -> List[BaseOperator]:
    # returns list of "head" / "root" tasks of DAG
    return [task for task in dag.tasks if not task.upstream_task_ids]


def get_end_tasks(dag: DAG) -> List[BaseOperator]:
    # returns list of "leaf" tasks of DAG
    return [task for task in dag.tasks if not task.downstream_task_ids]

Type-Annotations from Python 3.6+


UPDATE-1

Now Airflow DAG model has powerful @property functions like

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131