I am new to Azure Machine Learning and have been struggling with importing modules into my run script. I am using the AzureML SDK for Python. I think I somehow have to append the script location to PYTHONPATH, but have been unable to do so.
To illustrate the problem, assume I have the following project directory:
project/
src/
utilities.py
test.py
run.py
requirements.txt
I want to run test.py on a compute instance on AzureML and I submit the run via run.py. A simple version of run.py looks as follows:
from azureml.core import Workspace, Experiment, ScriptRunConfig
from azureml.core.compute import ComputeInstance
ws = Workspace.get(...) # my credentials here
env = Environment.from_pip_requirements(name='test-env', file_path='requirements.txt')
instance = ComputeInstance(ws, '<instance-name>')
config = ScriptRunConfig(source_directory='./src', script='test.py', environment=env, compute_target=instance)
run = exp.submit(config)
run.wait_for_completion()
Now, test.py imports functions from utilities.py, e.g.:
from src.utilities import test_func
test_func()
Then, when I submit a run, I get the error:
Traceback (most recent call last):
File "src/test.py", line 13, in <module>
from src.utilities import test_func
ModuleNotFoundError: No module named 'src.utilities'; 'src' is not a package
This looks like a standard error where the directory is not appended to the Python path. I tried two things to get rid of it:
- include an
__init__.py
file in src. This didn't work and I would also for various reasons prefer not to use__init__.py
files anyways. - fiddle with the environment_variables passed to AzureML like so
env.environment_variables={'PYTHONPATH': f'./src:${{PYTHONPATH}}'
but that didn't really work either and I assume that is simply not the correct way to append the PYTHONPATH
I would greatly appreciate any suggestions on extending PYTHONPATH or any other ways to import modules when running a script in AzureML.