3

I deployed webserver, scheduler, flower and worker on ecs fargate using docker image. I am using latest airflow version 2.1.2. when i am running dag, the worker node is throwing error saying the below:

[2021-08-13 11:38:45,323: ERROR/ForkPoolWorker-7] Task airflow.executors.celery_executor.execute_command[c22087fe-52e7-402d-bc89-d341e37f56e9] raised unexpected: AirflowException('Celery command failed on host: ip-172-30-1-180.ec2.internal')
Traceback (most recent call last):
  File "/root/.local/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/root/.local/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
    return self.run(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/airflow/executors/celery_executor.py", line 88, in execute_command
    _execute_in_fork(command_to_exec)
  File "/usr/local/lib/python3.7/site-packages/airflow/executors/celery_executor.py", line 99, in _execute_in_fork
    raise AirflowException('Celery command failed on host: ' + get_hostname())
airflow.exceptions.AirflowException: Celery command failed on host: ip-172-XX-X-XX.ec2.internal

I tried changing the hostname in ecs fargate on worker, but it is not reflecting. If i restart the task will be killed. Can someone help me. enter image description here

Mani
  • 85
  • 10
  • Could you share the logs found in the worker for the failed task? – NicoE Aug 13 '21 at 13:24
  • sharing the screenshot. – Mani Aug 13 '21 at 13:54
  • In your worker node, browse this path: `/logs/{dag_id}/{task_id}/{execution_date}/{try_number}.log` you should find the log with the execution detail that caused celery commando to fail. – NicoE Aug 13 '21 at 14:21
  • 1
    Unfortunately that log is also not created. I am not sure what went wrong. When I am starting the task, in next second it is throwing this error in the ouput. ``` *** Log file does not exist: /usr/local/airflow/logs/capiq_comp_pipeline_newone_1/create_cluster/2021-08-13T15:13:15.025680+00:00/1.log *** Fetching from: http://:8793/log/capiq_comp_pipeline_newone_1/create_cluster/2021-08-13T15:13:15.025680+00:00/1.log *** Failed to fetch log file from worker. The request to ':///' is missing either an 'http://' or 'https://' protocol. ``` – Mani Aug 13 '21 at 15:15
  • Except this log nothing is shown in the output log of any service. – Mani Aug 13 '21 at 15:16
  • I would suggest going over the basics, make sure that are no differences between worker nodes and the scheduler. They should have the same version of your code, meet the same dependencies, and share the same Airflow settings. Check [this answer](https://stackoverflow.com/a/68198920/10569220) for some examples. – NicoE Aug 13 '21 at 17:12
  • Were you able to fix this issue? – Bhavani Ravi Nov 19 '21 at 10:01

0 Answers0