One of my Airflow DAGs runs without any issues most of the time. However, every now and then (every >3 hours), it "freezes".
In this state, its tasks are not "queued" (see attached image), and the timeouts which exist on specific tasks also do not activate. The only way of getting out of such a scenario is my manually marking that run as a fail.
This failure is always followed up by another immediate failure (see blank cells in the image).
What should I look for in the logs and/or what are other ways of debugging this?