If running a dask-cloudprovider
ECSCluster
or FargateCluster
the concurrent.futures._base.CancelledError
can result from a long-running step in computation where there is no output (logging or otherwise) to the Client
. In these cases, due to the lack of interaction with the client, the scheduler regards itself as "idle" and times out after the configured cloudprovider.ecs.scheduler_timeout
period, which defaults to 5 minutes. The CancelledError error message is misleading, but if you look in the logs for the scheduler task itself it will record the idle timeout.
The solution is to set scheduler_timeout
to a higher value, either via config or by passing directly to the ECSCluster
/FargateCluster
constructor.