0

I am trying schedule a dag to run every x seconds. I put the start time as a past date with catchup = False and end time as few seconds into the future.

Although the dag starts as expected, it does not end and goes on forever.

The dag ends if I use an absolute end time like datetime(2019,9,26) but not with datetime.now()+timedelta(seconds=100)

start_date = datetime(2019, 1, 1)
end_date = datetime.now()+timedelta(seconds=200)

default_args = {
    "owner": "airflow",
    "depends_on_past": True,
    "start_date": start_date,
    "end_date": end_date
}

dag = DAG("file_dag", catchup=False, default_args=default_args, schedule_interval=timedelta(seconds=20), max_active_runs=1)

I expect the dag to stop executing after may be 10 or 11 runs depending on when it started. But it keeps executing even after 20 runs and does not seem to stop.

y2k-shubham
  • 10,183
  • 11
  • 55
  • 131

1 Answers1

1

You cannot / must not use datetime.now() in start_date and end_date expressions


The behaviour that you are observing is pretty obvious:

  • Recall that dag-definition files are parsed continuously in background. Section [6] Restrict the number of Airflow variables in your DAG in Airflow: Lesser Known Tips, Tricks and Best Practices says

    Your DAG files are parsed every X seconds

  • On each cycle of parsing of your dag-definition file, the end_date gets updated to 200 seconds after current time. Since parsing of dag-definition-file(s) goes on forever, the end_date keeps shifting and you get a never-ending dag

Community
  • 1
  • 1
y2k-shubham
  • 10,183
  • 11
  • 55
  • 131