2

Someone please tell me whether a DAG in airflow is just a graph (like a placeholder) without any actual data (like arguments) associated with it OR a DAG is like an instance (for a fixed argument)?

I want a system where the set of operations to perform (given a set of arguments) are fixed. But this input will be different everytime the set of operations are run. In simple terms, the pipeline is the same but the arguments to the pipeline will be different everytime it is run.

I want to know how to configure this in airflow? Should I create a new DAG for every new set of arguments? or any other method?

In my case, the graph is the same but want to run it on different data (from different users) as they come. So, should I create a new DAG everytime for new data?

Ajay
  • 131
  • 8
  • read the **`EDIT-1`**-*parts* [here](https://stackoverflow.com/a/54746434/3679900) and [here](https://stackoverflow.com/a/55132959/3679900) – y2k-shubham Oct 09 '19 at 02:20

3 Answers3

0

Yes you are correct; A DAG is basically kind off a one-way graph. You can create a DAG once by chaining together multiple operators together to form your "structure".

Each operator, can then take in multiple arguments that you can pass from the DAG definition file itself (if needed).

Or you can pass in a configuration object to the DAG, and access custom data from there using the context.

I would recommend reading the Airflow Docs for more examples: https://airflow.apache.org/concepts.html#tasks

Alice
  • 108
  • 5
  • Would it better if DAG is just a placeholder graph (without any params assigned) and we could create a wrapper (named DagInstance) which takes params and creates DAGs? That model makes things intuitive I feel...Now dag is not a generic graph..it is a workflow defined with params already fixed.. Does that make sense? – Ajay Oct 10 '19 at 02:46
0

You can think of Airflow DAG as a program made of other programs, with the exception that it can't contain loops(acyclic). Will you change your program every time input data changes? Of course, it all depends on how you write your program, but usually you'd like you program to generalise, right? You don't want two different programs to do 2+2 and 3+3. But you'll have different programs to show Facebook pages and to play Pokemon Go. If you want to do the same thing to a similar data then you want to write your DAG once, and maybe only change environment arguments(DB connection, date, etc) - Airflow is perfectly suitable for that.

Artem Vovsia
  • 1,520
  • 9
  • 15
0

You do not need to create a new DAG every time, if the structure of the graph is the same.

Airflow DAGs are created via code, so you are free to create a code structure that allows you to pass in arguments each time. How you do that will require some creative thinking.

You could, for example, create a web form that accepts the arguments, stores them in a DB and then schedules the DAG with the Airflow restAPI. The DAG code would then need to be written to retrieve params from the database.

There are several other ways to accomplish what you are asking, they all just depend on your use case. One caveat, the Airflow scheduler does not perform well if you change the start date of the DAG. For your idea above you will need to set the start date earlier than your first DAG run and then set the schedule interval to off. This way you have a start date that doesn’t change and dynamically triggered DAG runs.

trejas
  • 991
  • 7
  • 17
  • My requirements are that these dagruns (generic dag graph + specific user input) must be scheduled at different times depending on the user input. In that case, I need to create new dags for every input..right? Sorry..but I am new to airflow. – Ajay Oct 09 '19 at 15:18