4

I am calling weather API using Python script but the airflow task fails with error Negsignal.SIGSEGV. The Python script to call the weather API work fine when ran outside Airflow.

DAG

from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator
from datetime import datetime, timedelta
from scripts.weather_analysis.data_collection import query_weather_data
import pendulum

local_tz = pendulum.timezone("Asia/Calcutta")

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    #'start_date': airflow.utils.dates.days_ago(2), --> doesn't work
    'start_date': datetime(2022, 8, 29, tzinfo=local_tz),
}


dag = DAG('weather_dag_2', default_args=default_args, schedule_interval ='0 * * * *',
    )

# DAG to fetch weather data from api
t1 = PythonOperator(
        task_id = 'callApi',
        python_callable = query_weather_data,
        dag=dag
    )

Python script - query_weather_data.py

import requests
import json
from scripts.weather_analysis.config import API_KEY
from datetime import datetime

def query_weather_data():

    parameters = {'q':'Brooklyn, USA', 'appId': API_KEY}
    result = requests.get("http://api.openweathermap.org/data/2.5/weather?",parameters)

    if result.status_code == 200:
        json_data = result.json()
        print(json_data)
    else:
        print("Unable to fetch api data")

Error Log:

[2022-09-02, 17:00:04 IST] {local_task_job.py:156} INFO - Task exited with return code Negsignal.SIGSEGV
[2022-09-02, 17:00:04 IST] {taskinstance.py:1407} INFO - Marking task as FAILED. dag_id=weather_dag_2, task_id=callApi, execution_date=20220902T103000, start_date=20220902T113004, end_date=20220902T113004

Environment details:

MacOS Monterey

Airflow=2.3.4

Airflow deployment mode=Local

Python=3.10

I already tried the solution listed here Airflow DAG fails when PythonOperator tries to call API and download data but it doesn't solve my issue.

Please help.

  • I have the same problem using the `requests` library. PythonOperator tries to use the c code with rosetta. You can see the error log in the `Console` app (MacOs system log app). I avoid this problem by running the Airflow on another machine. – Artiya4u Oct 06 '22 at 06:48
  • I resolved with this workaround os.environ["no_proxy"]="*" – Siddharth Kanojiya Oct 07 '22 at 07:08

2 Answers2

9

I resolved with this workaround:

In query_weather_data.py, set the environment variable -

os.environ["no_proxy"]="*"
Syscall
  • 19,327
  • 10
  • 37
  • 52
0

I am afraid this is a problem with your machine. SIGSEGV is an indication of serious problem with the environment you run Airflow on, not Airflow itself. Neither Airflow nor the code of yours (Which might be the culprit) does not seem to use any low-level C-Code (Airflow for sure and your code is likely not to use it) and this is the only way how the "code" of application might generate it. If you do not use any other custom code, then your environment and deployment is definitely the problem.

There is not much Airflow can do about it, it seems that the python environment that you use by Airflow is broken - this might be because you have wrong architecture (ARM vs Intel and no emulation for example) - or because you have some librares that your Python loads and crash, but it has nothing to do with Airfow.

You have not written on how you are deploying the Airflow - except that it is local, but my advice would be to recreate the environment from scratch and make sure you create a completely separate virtualenv for Airflow and you install airflow following the standard installation instructions (including constraints) https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html. However if your python installation is broken, you might need to nuke it and install from scratch.

If you are using Docker Images, then you have to make sure that you use the right images for your architecture (Intel/ARM) depending if you are using Intel or M1 based processor. Airflow Docker images as of 2.3.0 are published for both Intel and ARM, but you might have an intel emulated old docker desktop installed that might not cope with different architecture well so you might need to nuke the installation and reinstall it from scratch if this is the case.

Generally speaking - trace down if you have any custom code of yours and remove/disable it to see if it makes a difference and then progressively nuke everything you use:

  • virtualenv
  • python installation
  • docker environment

You can also go the other way round. Get a basic "quick-start" of airflow and get it work, and progressively add your customisation or change the deployment to get closer to what you get (for example change Python version) - and do it one-step-at-a-time. The moment it breaks, you will know what's the reason.

If even basic quickstart does not work for you after you follow it rigorously and handle all the caveats described in the docs, this might indicate you might even need to nuke and reinstall the OS of yours, or in extreme cases fix the hardware (SIGSEGV often happens when memory or disk gets corrupted).

Jarek Potiuk
  • 19,317
  • 2
  • 60
  • 61