I've experienced this before as well. I believe it's caused by an HTTP request that takes longer than expected for the webserver's gunicorn worker to fulfill. For example, if you set the DAG tree view to a high setting like 365 DAG runs for a DAG with a lot of tasks, you may be able to reproduce this consistently.
Can you try bumping up the timeout settings on the webserver to see if it makes a difference?
- First, try increasing
web_server_worker_timeout
(default = 120 seconds) under the [webserver]
group.
- If that doesn't resolve it, you might also try increasing
web_server_master_timeout
under the same group.
- Another technique to try is switching the webserver
worker_class
(default = sync
) to eventlet
or gevent
.
Reference: https://github.com/apache/incubator-airflow/blob/c27098b8d31fee7177f37108a6c2fb7c7ad37170/airflow/config_templates/default_airflow.cfg#L225-L229
Note that the alternative worker classes require installing Airflow with the async
extras like:
pip install apache-airflow[async]
You can find more info about gunicorn worker timeouts in this question: How to resolve the gunicorn critical worker timeout error?.