2

I'm getting this SIGTerm error on Airflow 1.10.11 using LocalExecutor.

[2020-09-21 10:26:51,210] {{taskinstance.py:955}} ERROR - Received SIGTERM. Terminating subprocesses.

The dag task is doing this:

  1. reading some data from SQL Server (on Windows) to a pandas dataframe.
  2. And then it writes it to a file (it doesn't even get to this part).

The strange thing is if I limit the number of rows to return in the query (say TOP 100), the dag succeeds.

If I run the python code in my machine locally, it succeeds. I'm using pyodbc and sqlalchemy. It fails on this line after only 20 or 30 seconds:

df_query_results = pd.read_sql(sql_query, engine)

Airflow log

[2020-09-21 10:26:51,210] {{helpers.py:325}} INFO - Sending Signals.SIGTERM to GPID xxx [2020-09-21 10:26:51,210] {{taskinstance.py:955}} ERROR - Received SIGTERM. Terminating subprocesses. [2020-09-21 10:26:51,804] {{taskinstance.py:1150}} ERROR - Task received SIGTERM signal

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/airflow/dags/operators/sql_to_avro.py", line 39, in execute
    df_query_results = pd.read_sql(sql_query, engine)
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/sql.py", line 436, in read_sql
    chunksize=chunksize,
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/sql.py", line 1231, in read_query
    data = result.fetchall()
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1216, in fetchall
    e, None, None, self.cursor, self.context
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1478, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 153, in reraise
    raise value
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1211, in fetchall
    l = self.process_rows(self._fetchall_impl())
  File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/result.py", line 1161, in _fetchall_impl
    return self.cursor.fetchall()
  File "/usr/local/lib/python3.6/site-packages/airflow/models/taskinstance.py", line 957, in signal_handler
    raise AirflowException("Task received SIGTERM signal")
airflow.exceptions.AirflowException: Task received SIGTERM signal
[2020-09-21 10:26:51,813] {{taskinstance.py:1194}} INFO - Marking task as FAILED. 

EDIT: I missed this earlier, but there is a warning message about the hostname.

WARNING - The recorded hostname da2mgrl001d1.mycompany.corp does not match this instance's hostname airflow-mycompany-dev.i.mct360.com
Gabe
  • 5,113
  • 11
  • 55
  • 88
  • There could be a timeout that is being hit? SQL query might be running for a long time. Limiting the query, on the other hand, leads to smaller execution time and hence the timeout is not being hit. – jay.cs Sep 21 '20 at 16:55

1 Answers1

2

I had a Linux/network engineer help out. Unfortunately, I don't know the full details but the fix was they changed the hostname_callable setting in airflow.cfg to hostname_callable = socket:gethostname. It was previously set to socket:getfqdn

Note: I found a couple different (maybe related?) questions where this was the resolution.

Gabe
  • 5,113
  • 11
  • 55
  • 88