Does anyone use MWAA in production?
We currently have around 500 DAGs running and we see an unexpected behavior with tasks staying in a "queued" state for unknown reasons.
Task is in the 'queued' state which is not a valid state for execution. The task must be cleared in order to be run.
It happens randomly, can perfectly run for a day and then a few tasks will stay queued. The tasks will stay in this state forever unless we mark them as failed manually.
A DAG run can stay in this "queued" state even if the pool is empty, I don't see any reasons explaining this.
It happens to ~5% of the tasks with all the others running smoothly.
Did you ever encounter this behavior?