2

I am using Local Executor. I have a situation where i have unique dags getting generated for each request id for eg 1.py , 2.py .

1.py assume has two tasks and 2.py has 3 tasks. I would also receive more dags periodically for eg 3.py,4.py etc.

Is there any problem of creating a dag for every new id/request ID.

I have observed that Scheduler keeps giving this log.

Started a process (PID: 92186) to generate tasks for /Users/nshar141/airflow/dags/3.py - logging into /Users/nshar141/airflow/logs/scheduler/2018-05-07/3.py.log

My question here is why scheduler keeps generating separate PIDs for generating tasks. I tried changing different parameters in the config related to concurrency and parallelism but scheduler seems to be executing that statement everytime for every dag present in dags folder.

I am attaching my dag definition. I want to run dag as soon as it is created. What are the parameters i should give in start_time and scheduler_interval?

dag = DAG('3', description='Sample DAG',schedule_interval=@once,start_date=datetime(2018, 5, 07), catchup=False)

Since i have a need to generate dags dynamically with unique dag id and place it in the dags folder my concern here is scheduler would generate too many process IDS for every dag in the folder which already has been executed.

Image showing scheduler generating PIDs repeatedly

Nitish Sharma
  • 331
  • 4
  • 7

1 Answers1

1

Why do you want to create a new DAG for every request? I think that the most appropriate way would be to store requests and have a single DAG perform your logic for multiple requests at the same time, in a batch fashion. You can run your DAG very often if you want.

You seem to want tasks to be executed as soon as possible. If you're interested in near real-time with a lot of throughput, Airflow may not be appropriate and you'll want to use a message queue instead.

Antoine Augusti
  • 1,598
  • 11
  • 13
  • I want to assign unique DAG id to each request so that in the airflow UI i can segregate every request, if something fails. but if i have only one dag which do things in batch for different requests then i will have a huge list of tasks being associated with a single dag and would be hard to follow and track. I am trying something like this https://stackoverflow.com/questions/39133376/airflow-dynamic-dag-and-task-ids/39666991 but issue is how do i restrict scheduler not to scan my master.py on every heartbeat instead scan it every 1 min.(so that one id assigned to one dag, no race conditions) – Nitish Sharma May 12 '18 at 21:47