I am trying to create a Sklearn processing job in Amazon Sagemekar to perform some data transformation of my input data before I do model training.
I wrote a custom python script preprocessing.py
which does the needful. I use some python package in this script. Here is the Sagemaker example I followed.
When I try to submit the Processing Job I get an error -
............................Traceback (most recent call last):
File "/opt/ml/processing/input/code/preprocessing.py", line 6, in <module>
import snowflake.connector
ModuleNotFoundError: No module named 'snowflake.connector'
I understand that my processing job is unable to find this package and I need to install it. My question is how can I accomplish this using Sagemaker Processing Job API? Ideally there should be a way to define a requirements.txt
in the API call, but I don't see such functionality in the docs.
I know I can create a custom Image with relevant packages and later use this image in the Processing Job, but this seems too much work for something that should be built-in?
Is there an easier/elegant way to install packages needed in Sagemaker Processing Job ?