3

I just started with Airflow. I want to set up a DAG in a loop, where the next DAG starts when the previous DAG is completed. Here is the work flow that I want to achieve:

list_of_files = [......]
for file in list_of_files:
   dag = DAG('pipeline', default_args=default_args, schedule_interval=None)
   t1 = BashOperator('copy_this_file', ....)
   t2 = BashOperator('process_this_file', ...)
   t1.set_downstream(t2)

If I run airflow backfill pipeline -s 2019-05-01, all the DAGs are started simultaneously.

JMarc
  • 984
  • 1
  • 13
  • 21

1 Answers1

3

DAGs can't depend on each other, they are separate workflows. You want to configure tasks to depend on each other instead. You can have a single DAG with multiple execution branches, one for each file, something like this (not tested):

dag = DAG('pipeline', ...)
list_of_files = [......]
with dag:
    for file in list_of_files:
       t1 = BashOperator('copy_this_file', ....)
       t2 = BashOperator('process_this_file', ...)
       t1.set_downstream(t2)
bosnjak
  • 8,424
  • 2
  • 21
  • 47