In my airflow DAG, I have setup an on_failure_callback
function that pushes exceptions to a Slack integration. I pass in the context
of the task and extract the exception
from the context
using `context.get('exception') as described in this answer.
The problem is, it does not show the real cause of the Exception. The real cause is in the INFO
section of the airflow logs while the ERROR
section only contains the eventual errors that were caused by the real problem.
Example, below is a sample log that prints when I throw a custom error
2021-12-22 13:53:46,006] {pod_launcher.py:173} INFO - Event: transform-file-schema-6ba2b26845da43daa1b59ca5b221c839 had an event of type Pending
[2021-12-22 13:53:46,006] {pod_launcher.py:139} WARNING - Pod not yet started: transform-file-schema-6ba2b26845da43daa1b59ca5b221c839
[2021-12-22 13:53:47,017] {pod_launcher.py:173} INFO - Event: transform-file-schema-6ba2b26845da43daa1b59ca5b221c839 had an event of type Running
[2021-12-22 13:53:47,063] {pod_launcher.py:156} INFO - b'ERROR:root:Apatha throw error\n'
[2021-12-22 13:53:47,064] {pod_launcher.py:156} INFO - b'Traceback (most recent call last):\n'
[2021-12-22 13:53:47,064] {pod_launcher.py:156} INFO - b' File "job_transform_file_schema.py", line 8, in <module>\n'
[2021-12-22 13:53:47,064] {pod_launcher.py:156} INFO - b' JobRunner().run(sys.argv[1:])\n'
[2021-12-22 13:53:47,064] {pod_launcher.py:156} INFO - b' File "/usr/local/lib/python3.7/dist-packages/id_intl_dataflow/transform_file_schema.py", line 448, in run\n'
[2021-12-22 13:53:47,065] {pod_launcher.py:156} INFO - b' raise ApathaError("throw custom error")\n'
[2021-12-22 13:53:47,065] {pod_launcher.py:156} INFO - b'id_intl_dataflow.transform_file_schema.ApathaError: throw custom error\n'
[2021-12-22 13:53:48,080] {pod_launcher.py:160} INFO - Container transform-file-schema-6ba2b26845da43daa1b59ca5b221c839 has state running
[2021-12-22 13:53:50,154] {pod_launcher.py:267} INFO - Running command... cat /airflow/xcom/return.jso
[2021-12-22 13:53:50,201] {pod_launcher.py:274} INFO - cat: can't open '/airflow/xcom/return.json': No such file or director
[2021-12-22 13:53:50,202] {pod_launcher.py:267} INFO - Running command... kill -s SIGINT
[2021-12-22 13:54:16,091] {taskinstance.py:1152} ERROR - Pod Launching failed: Failed to extract xcom from pod: transform-file-schema-6ba2b26845da43daa1b59ca5b221c839
Traceback (most recent call last)
File "/usr/local/lib/airflow/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 361, in execut
final_state, _, result = self.create_new_pod_for_operator(labels, launcher
File "/usr/local/lib/airflow/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 508, in create_new_pod_for_operato
final_state, result = launcher.monitor_pod(pod=self.pod, get_logs=self.get_logs
File "/usr/local/lib/airflow/airflow/kubernetes/pod_launcher.py", line 162, in monitor_po
result = self._extract_xcom(pod
File "/usr/local/lib/airflow/airflow/kubernetes/pod_launcher.py", line 262, in _extract_xco
raise AirflowException('Failed to extract xcom from pod: {}'.format(pod.metadata.name)
airflow.exceptions.AirflowException: Failed to extract xcom from pod: transform-file-schema-6ba2b26845da43daa1b59ca5b221c83
During handling of the above exception, another exception occurred
As you can see the real reason of the error is in the INFO
section:
[2021-12-22 13:53:47,063] {pod_launcher.py:156} INFO - b'ERROR:root:Apatha throw error\n'
Because of this issue, the context.get('exception')
is not returning the true reason of a failure. What do I change so that context.get('exception')
also has the INFO logs? Alternatively, what other variable in context
can I use to get the INFO logs which have the root cause of this issue?