0

One of my Airflow DAGs runs without any issues most of the time. However, every now and then (every >3 hours), it "freezes".

In this state, its tasks are not "queued" (see attached image), and the timeouts which exist on specific tasks also do not activate. The only way of getting out of such a scenario is my manually marking that run as a fail.

This failure is always followed up by another immediate failure (see blank cells in the image).

What should I look for in the logs and/or what are other ways of debugging this?

see empty cells

Community
  • 1
  • 1
abhishekbh
  • 539
  • 5
  • 15
  • [this](https://stackoverflow.com/q/55104705/3679900) looks similar – y2k-shubham Mar 12 '19 at 04:33
  • thanks for the link. I think one way in which my issue is dissimilar to the above is that the scheduler does seem to be "running", or picking up dag tasks (see the red dots in the first line in the image), however, it never seems to actually kick off the the tasks, hence the blanks. – abhishekbh Mar 12 '19 at 19:38
  • Do you see anything suspicious in the scheduler log for the time the failed jobs are ran? – SergiyKolesnikov Mar 13 '19 at 15:52
  • @SergiyKolesnikov nothing that stands out to be honest, there are no reported errors in those instances. The logs are just a little more cryptic than I would like them to be, anything specific I should be looking for? – abhishekbh Mar 14 '19 at 22:48

1 Answers1

0

Found the issue, it was just some tasks running longer than the schedule and hence double running in parallel.

I was hoping that in such cases airflow would provide some kind of feedback in the logs or UI, but that isn't the case.

Resolved.

abhishekbh
  • 539
  • 5
  • 15