3

I have installed Airflow on the server which is running Ubuntu and python 3.8. I'm trying to import a simple dag in Airflow UI to list the files in the bucket.

from airflow import DAG
from airflow.providers.amazon.aws.operators.s3_copy_object import S3CopyObjectOperator
from airflow.providers.amazon.aws.operators.s3_list import S3ListOperator
from airflow.operators.python import PythonOperator
from airflow.operators.bash import BashOperator

from datetime import datetime

default_args = {
    'start_date':datetime(2020,1,1)
}


with DAG('S3_to_S3', schedule_interval='@daily',default_args=default_args,catchup=False) as dag:
    list_files = S3ListOperator(
        task_id = 'list_S3_bucket',
        aws_conn_id='S3_Connection',
        bucket='xxxxx'
    )

list_files

But it fails to import in Airflow UI and throws the exception:

  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/airflow/airflow/dags/S3_to_S3.py", line 4, in <module>
    from airflow.providers.amazon.aws.operators.s3_list import S3ListOperator
ModuleNotFoundError: No module named 'airflow.providers.amazon'

I have already installed the amazon provider package and its dependencies using pip, but Airflow UI fails to find those packages. To verify if provider is installed I imported the amazon package in python console and it was successfully imported.

Python 3.8.9
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>>
>>>
>>> from airflow.providers.amazon.aws.operators.s3_list import S3ListOperator
>>> from airflow.providers.amazon.aws.operators.s3_copy_object import S3CopyObjectOperator
>>> import airflow.providers.amazon
>>>

Do I need to change any configuration in airflow or set any environment variable?

  • 1
    What executor do you use? I assume you use Celery? You need to install the provider package on all workers. It seems that you installed it only on the machine that runs the scheduler. – Elad Kalif May 23 '21 at 07:34
  • I'm using Sequential Executor. Currently, we are conducting POC on airflow so we have installed airflow on only one node. All the services - Scheduler, Webserver, database are running on one node. I have only changed couple of things in airflow cfg - executor = SequentialExecutor, sql_alchemy_conn = postgresql+psycopg2://xxx:xxxx@localhost:5432/airflow, load_examples = False – Shubam sachdeva May 23 '21 at 08:26
  • Could this be that you installed airflow in --editable mode ? https://github.com/apache/airflow/issues/13603#issuecomment-757923733 ? – Elad Kalif May 23 '21 at 08:28
  • Thanks for the response. I haven't installed the airflow in editable mode, but I went through the article and I saw some of my providers are installed in site packages and some of them are in dist packages which i guess creates a problem but I'm not sure how to resolve this problem because when I use `pip install apache-airflow-providers-amazon` It installs the package under /home/airflow/.local/lib/python3.8/site-packages/ and not under /usr/local/lib/python3.8/dist-packages. How do I fix it ? – Shubam sachdeva May 24 '21 at 06:28
  • `xxxxx:~$ ls /usr/local/lib/python3.8/dist-packages | grep apache_airflow_providers_* apache_airflow_providers_ftp-1.1.0.dist-info apache_airflow_providers_http-1.1.1.dist-info apache_airflow_providers_imap-1.0.1.dist-info apache_airflow_providers_postgres-1.0.2.dist-info apache_airflow_providers_slack-3.0.0.dist-info xxxx:~$ ls .local/lib/python3.8/site-packages/ | grep apache_airflow_providers_* apache_airflow_providers_amazon-1.4.0.dist-info apache_airflow_providers_exasol-1.1.1.dist-info apache_airflow_providers_google-3.0.0.dist-info apache_airflow_providers_ssh-1.3.0.dist-info` – Shubam sachdeva May 24 '21 at 06:29
  • maybe https://stackoverflow.com/questions/2915471/install-a-python-package-into-a-different-directory-using-pip will help you – Elad Kalif May 24 '21 at 06:38
  • Thank you so much. The problem is fixed. Looks like Airflow picks up the provider packages from /usr/local/lib/python3.8/dist-packages. While installing the package using simple command `pip install apache-airflow-providers-amazon`, it was not able to access /usr/local/lib/python3.8/dist-packages so it installed under .local/lib/python3.8/site-packages/. I deleted all the packages from site-packages and install using `sudo python3 -m pip install apache-airflow-providers-amazon` which installed the package in /usr/local/lib/python3.8/dist-packages. Now, I don't see error in Airflow UI. – Shubam sachdeva May 24 '21 at 07:12
  • Please add your answer. This helped alot. Thanks – Shubam sachdeva May 24 '21 at 07:12

1 Answers1

2

As discussed in comments the issue happens because the provider is installed in different path than Airflow resulting in Airflow not finding the provider library:

/usr/local/lib/python3.8/dist-packages

.local/lib/python3.8/site-packages/

The solution is to clean up the environment and install the provider in the same path of Airflow. This has been also discussed in Github issue

Elad Kalif
  • 14,110
  • 2
  • 17
  • 49
  • 1
    I would recommend to use `sudo python3 -m pip install apache-airflow-providers-amazon` which is much safer version than using `pip install apache-airflow-providers-amazon` – Shubam sachdeva May 25 '21 at 10:10