I want to build something where I need to capture all of the leaf tasks and add a downstream dependency to them to make a job complete in our database. Is there an easy way to find all the leaf nodes of a DAG in Airflow?
Asked
Active
Viewed 3,419 times
8
-
The only way I know of right now is by checking the downstream_list and making sure it's empty. Is there a better way – Ace Haidrey Apr 21 '17 at 18:54
1 Answers
4
Use upstream_task_ids
and downstream_task_ids
@property
from BaseOperator
def get_start_tasks(dag: DAG) -> List[BaseOperator]:
# returns list of "head" / "root" tasks of DAG
return [task for task in dag.tasks if not task.upstream_task_ids]
def get_end_tasks(dag: DAG) -> List[BaseOperator]:
# returns list of "leaf" tasks of DAG
return [task for task in dag.tasks if not task.downstream_task_ids]
Type-Annotations
from Python 3.6+
UPDATE-1
Now Airflow DAG
model has powerful @property
functions like

y2k-shubham
- 10,183
- 11
- 55
- 131
-
Also see: [get all parent task_ids](https://stackoverflow.com/a/54728871/3679900) – y2k-shubham Jul 11 '19 at 17:02