1

I have a long running task where it loops through calling some REST endpoint to get data maybe hundreds of times and could take up to 1 hour. While the task is still running, I see there are multiple attempts or retries, even though I specifically put retries=0 on this PythonOperator that calls this particular function. As part of the default_args I pass into the dag, it also has "retries": 0.

enter image description here

So I'm not sure what is retrying the task even while it is still running and no error whatsoever in the logs.

In the airflow.cfg file I have changed the following settings:

job_heartbeat_sec = 3600
scheduler_heartbeat_sec = 3600
scheduler_zombie_task_threshold = 3600
task_timeout=3600

But the symptom persists.

What else could be doing this?

The time between the retries as shown in the logs:

[2023-02-24, 02:32:12 UTC] {{taskinstance.py:1363}} INFO - Starting attempt 1 of 1
[2023-02-24, 02:34:30 UTC] {{taskinstance.py:1363}} INFO - Starting attempt 2 of 1
[2023-02-24, 02:39:31 UTC] {{taskinstance.py:1363}} INFO - Starting attempt 3 of 1

The retry between attempts 2 and 3 (and sometimes if it retries more than 3 times, between 3 and 4, and 4 and 5...etc) is exactly 5 minutes. I searched for "300" (seconds) in my airflow.cfg and only found dag_dir_list_interval is set to 300, which doesn't seem related. Also, I'm not sure why the retry between 1 and 2 is some seemingly random length, 2 minutes and 18 seconds (i.e. not 5 minutes).

nismoh
  • 53
  • 1
  • 7

0 Answers0