5

I'm going to start using AWS-managed airflow. For the managed airflow to access the dags, I need to upload my code to the dags/ directory in an s3 bucket, and MWAA will pick it up.

However, in my codebase, I have codes in other directories, for example, the tasks/ directory. The problem is when I upload the tasks folder to the s3 bucket mwaa doesn't pick them up, and I get import errors for my dag.

AWS documentation doesn't provide any guidance for this. I wonder if anyone has done this before? Or do I have to upload all of my code into the dags/ folder?

taraf
  • 777
  • 2
  • 10
  • 28

2 Answers2

2

As of Airflow v2.x, custom code/modules should be imported as regular Python modules. Specifically:

What's changed in v2

In v2 and above, the recommended approach is to place them in the DAGs directory and create and use an .airflowignore file to exclude them from being parsed as DAGs.

Reference: https://docs.aws.amazon.com/mwaa/latest/userguide/configuring-dag-import-plugins.html#configuring-dag-plugins-changed

Or do I have to upload all of my code into the dags/ folder?

Yes, you need to upload your code into the dags/ folder. You'll also need to update impacted import statements.

Before:

s3://<bucket>/dags/dag1.py
s3://<bucket>/dags/dag2.py
s3://<bucket>/tasks/task1.py
s3://<bucket>/tasks/task2.py

After:

s3://<bucket>/dags/dag1.py
s3://<bucket>/dags/dag2.py
s3://<bucket>/dags/tasks/task1.py
s3://<bucket>/dags/tasks/task2.py
s3://<bucket>/dags/.airflowignore
Andrew Nguonly
  • 2,258
  • 1
  • 17
  • 23
  • This should be the correct answer. It's also consistent with Airflow suggested structure: https://airflow.apache.org/docs/apache-airflow/stable/modules_management.html#typical-structure-of-packages – Andy Mar 24 '22 at 20:00
0

What we do for MWAA using airflow version 1.10 is that whatever custom code we write that needs to be used in dags, is added as plugins. Currently you can only have dags deployed based on the s3 folder/key that you specifiy. We have our custom code in a separate folder that is then zipped and configured mwaa as the plugins location.

Im not sure what your tasks folder holds and what sort of code it has, but below is an example of our code structure.

src
    ------> dags (dir)
                  ----------> dag1.py
                  ----------> dag2.py
                  ----------> dag3.py
    -------> plugins(dir)
                    ----------> __init__.py
                    ----------> common (dir)
                                            -----> __init__.py
                                            -----> something.py
                    -----------> hooks (dir)
                                            ----> __init__.py
                                            ---> somehook.py
                    ------------>operators(dir)
                                           ----> __init__py
                                           ----> op1.py
                                           ----> op2.py

Using the above structure dags are deployed as is in the dags folder Plugin folder is completely zipped and uploaded into s3 to the location specified in the plugin location in mwaa

Emerson
  • 1,136
  • 1
  • 6
  • 9