0

Say I have two pre-existing DAGs, A and B. Is it possible in Airflow to "copy" all tasks from B into A, preserving dependencies and propagating default arguments and the like for all of B's tasks? With the end goal being to have a new DAG A' that contains all of the tasks of both A and B.

I understand that it might not be possible or feasible to reconcile DAG-level factors, e.g. scheduling, and propagate across the copying, but is it possible to at least preserve the dependency orderings of the tasks such that each task runs as expected, when expected—just in a different DAG?

If it is possible, what would be the best way to do so? If it's not supported, is there work in progress to support this sort of "native DAG composition"?

Jonathan Jin
  • 485
  • 1
  • 3
  • 15

1 Answers1

0

UPDATE-1

Based on clarification of question expressed, I infer that requirement is not to replicate a DAG into another but to append it after another DAG.

  1. The techniques mentioned in the original answer below are still applicable (to variable extent)

  2. But for this specific use-case there are few more options

    i. Use TriggerDagRunOperator: Invoke your 2nd DAG at the end of 1st DAG

    ii. Use SubDagOperator: Wrap your 2nd DAG into a Sub-Dag and attach it at the end of 1st DAG

    But do checkout Wiring top-level DAGs together thread (question / answer plus comments) for ideas / loopholes in each of above mentioned techniques


ORIGINAL ANSWER

I can think of 3 possible ways

  1. The recommended way would be to programmatically construct your DAG. In other words, if possible, iterate over a list of configs (each config for one task) read from an external source (such as Airflow Variable, database or JSON files) and build your DAG as per your business logic. Here, you'll just have to alter the dag_id and you can re-use the same script to build identical DAG as your original one

  2. A modification of 1st approach above is to generalize your dag-construction logic by employing a simple idea like ajbosco/dag-factory or a full-fleged wrapper framework like etsy/boundary-layer

  3. Finally if none of the above approaches are easily adaptable for you, then you can hand-code the task-replication logic to regenerate the same structure as your original DAG. You can write a single robust script and re-use it across your entire project to replicate DAGs as and when needed. Here you'll have to go through DAG-traversal and some traditional data-structure and algorithmic stuff. Here's an example of BFS-like traversal over tasks of an Airflow DAG

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131
  • I suspect that I might not have fully expressed my question correctly in initial post. I've gone ahead and edited to clarify. My goal here is not so much to entirely recreate a pre-existing DAG—as you've given good suggestions for here—but rather to take the tasks of DAG B and "append" them to DAG A to create, say, DAG A' that contains all tasks of both A and B. Hope that helps. – Jonathan Jin Aug 16 '19 at 10:55