2

Hi I'm currently running airflow on a Dataproc cluster. My DAGs used to run fine but facing this issue where tasks are ending up in 'retry' state without any logs when I click on task instance -> logs on airflow UI

I see the following error in terminal where I started the airflow webserver

2022-06-24 07:30:36.544 [ERROR] Executor reports task instance 
<TaskInstance: **task name** 2022-06-23 07:00:00+00:00 [queued]> finished (failed) 
although the task says its queued. Was the task killed externally? 
None
[2022-06-23 06:08:33,202] {models.py:1758} INFO - Marking task as UP_FOR_RETRY
2022-06-23 06:08:33.202 [INFO] Marking task as UP_FOR_RETRY

What I tried so far

  • restarted webserver
  • Started server from 3 different ports
  • re-ran backfill command with 3 different timestamps
  • deleted dag runs for my dag, created a new dag run and then re-ran backfill command
  • cleared the PID as mentioned here How do I restart airflow webserver? and restarted the webserver

None of these worked. This issue is persistent for the past two days, appreciate any help here.At this point I'm guessing this is to do with a shared DB but not sure how to fix this.

<<update>> So what I also found is these tasks eventually go to success or failure state. when that happens the logs are available, but still no logs for the retry attempts in $airflow_home or our remote directory

Fremzy
  • 290
  • 2
  • 12

1 Answers1

0

The issue was there was another celery worker listening on the same queue. since this second worker was not configured properly it was failing the task and not writing the logs to remote location.

Fremzy
  • 290
  • 2
  • 12