I'm using cloud composer composer-1.10.6-airflow-1.10.6
There seems to be a problem with airflow starting tasks and losing track of them. The logs stop, and eventually it gets marked as failed, but actually the task completes successfully. This is a big problem for non-idempotent tasks (like tasks that append data) if we have retries configured. Whenever this happens I have to go and manually investigate if the task actually did complete or not, and mark the job accordingly.
Here's an example log. It's a typical log from this problem. Not much useful info in it, basically they just look like they end prematurely (hence It feels like airflow as lost track of the job). Meanwhile the job has still completed successfully.
*** Reading remote log from gs://bucket/log/path/log.log
[2020-08-31 12:16:33,450] {taskinstance.py:630} INFO - Dependencies all met for <TaskInstance: builder.launch_loader_prd 2020-08-30T10:30:00+00:00 [queued]>
[2020-08-31 12:16:33,569] {taskinstance.py:630} INFO - Dependencies all met for <TaskInstance: builder.launch_loader_prd 2020-08-30T10:30:00+00:00 [queued]>
[2020-08-31 12:16:33,571] {taskinstance.py:841} INFO -
--------------------------------------------------------------------------------
[2020-08-31 12:16:33,572] {taskinstance.py:842} INFO - Starting attempt 1 of 1
[2020-08-31 12:16:33,572] {taskinstance.py:843} INFO -
--------------------------------------------------------------------------------
[2020-08-31 12:16:33,605] {taskinstance.py:862} INFO - Executing <Task(DataflowTemplateOperator): launch_loader_prd> on 2020-08-30T10:30:00+00:00
[2020-08-31 12:16:33,608] {base_task_runner.py:133} INFO - Running: ['airflow', 'run', 'builder', 'launch_loader_prd', '2020-08-30T10:30:00+00:00', '--job_id', '449104', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/mydag.py', '--cfg_path', '/tmp/tmpmzlheavp']
Any help is appreciated. Thanks.