1

I am reading this question trying to implement dependencies on tasks of other DAGs. In this example, dependency is written as:

ExternalTaskSensor(
    task_id='wait_for_the_first_task_to_be_completed',
    external_dag_id='a',
    external_task_id='first_task',
    dag=dag) >> \

In my data warehouse, one table may depends on hundreds of tasks. Using this format, it will generate 2*number of dependencies lines of codes. This is really unacceptable, is there any better choice?

For instance, in Azkaban, I can write multi dependencies like this:

dependencies = dag1.task1, dag2.task4, dag2.task5, DAG3.task2, etc...

Any help is appreciated.

user2894829
  • 775
  • 1
  • 6
  • 26

1 Answers1

1

You can create your sensors in a loop and set dependencies within it. I think it's cleaner, but I'm not sure if that meets your requirement regarding amount of code as number of dependencies increase.

Example:

dependencies = [('dag1', 'task1'), ('dag2', 'task4'), ('dag2', 'task5'), ('dag3', 'task2')]

other_task = PythonOperator(...)

for dag_id, task_id in dependencies:
    sensor = ExternalTaskSensor(
        task_id='wait_for_{0}.{1}'.format(dag_id, task_id),
        external_dag_id=dag_id,
        external_task_id=task_id,
        dag=dag)
    sensor >> other_task
Daniel Huang
  • 6,238
  • 34
  • 33