0

Is there any way to import custom python modules into dag file without mixing dag environs and sys.path? Can't use something like

environ["PROJECT_HOME"] = "/path/to/some/project/files"
# import certain project files
sys.path.append(environ["PROJECT_HOME"])
import mymodule

because it the sys.path is shared among all dags and this causes problems (eg. sharing of values between dag definitions) if want to import modules from different places that have the same name for different dag definitions (and if there are many dags, this is hard to keep track of).

The docs for using packaged dags (which seemed like a solution) do not seem to avoid the problem

the zip file will be inserted at the beginning of module search list (sys.path) and as such it will be available to any other code that resides within the same interpreter.

Anyone with more airflow knowledge know how to handle this kind of situation?

* Differs from linked-to question in that is less specific about implementation

lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102

1 Answers1

0

Ended up doing something like this:

if os.path.isfile("%s/path/to/specific/module/%s.py" % (PROJECT_HOME, file_name)):
    import imp
    f = imp.load_source("custom_module", "%s/path/to/specific/module/%s.py" % (PROJECT_HOME, file_name))
    df = f.myfunc(sparkSession, df)

To get the needed module file explicitly from known paths, based on the SO post here.

lampShadesDrifter
  • 3,925
  • 8
  • 40
  • 102