I am trying to build an AWS Sagemaker pipeline. In my root dir I have a process.py script and a utils.py script. In the process.py script I'm trying to import additional functions from utils.py. When I run the pipeline, the processing job fails because it can't find the module utils.
File "/opt/ml/processing/input/code/process.py", line 4, in <module>
from utils import (
ModuleNotFoundError: No module named 'utils'
I have tried setting up the framework processor as follows:
sklearn_processor = FrameworkProcessor(
estimator_cls=SKLearn,
framework_version=framework_version,
instance_type="ml.m5.xlarge",
instance_count=1,
sagemaker_session=sagemaker_session,
image_uri=image_uri,
role=role,
)
step_args = sklearn_processor.run(
inputs=[
ProcessingInput(source=s3_bucket, destination="/opt/ml/processing/input"),
],
outputs=[
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
],
code="process.py",
)
step_process = ProcessingStep(
name="process-step",
step_args=step_args,
)
I know in this question the answer was to specify the source_dir but in my case the utils.py and process.py scripts are already in the same directory (namely the root). Do I need to specify the source_dir as the root? If so how would I do that?