2

Airflow documentation clearly states

SubDAGs must have a schedule and be enabled. If the SubDAG’s schedule is set to None or @once, the SubDAG will succeed without having done anything

Although we must stick to the documenation, I've found they work without a hiccup even with schedule_interval set to None or @once. Here's my working example.


My current understanding (I heard about Airflow only 2 weeks back) of SubDagOperators (or subdags) is


My questions are

  • Why does my example work when it shouldn't?
  • Why shouldn't my example work (as per the docs) in the first place?
  • Any subtle differences between behaviour of SubDagOperator and other operators?
  • When solutions of known problems exist, why is there so much uproar against SubDagOperators?

I'm using puckel/docker-airflow with

  • Airflow 1.9.0-4
  • Python 3.6-slim
  • CeleryExecutor with redis:3.2.7
y2k-shubham
  • 10,183
  • 11
  • 55
  • 131

1 Answers1

1

If you are just running your DAG once, then you probably won't have any issues with SubDags (as in your example) - especially if you have a bunch of worker slots available. Try letting a few DagRuns of your example accumulate and see if everything runs smoothly if you try to delete and re-run some.

The community has advised moving away from SubDags because unexpected behavior starts happening when you need to re-run old DagRuns or run bigger backfills.

It is not so much that the DAG won't work, but more that unexpected can happen that may affect your workflows that isn't worth the risk when all you are getting in return is a nicer looking DAG.

Even though known solutions exist, implementing them may not be worth the effort.

Viraj Parekh
  • 1,351
  • 6
  • 14