0

I have a custom transformer in a class that I used to train my model locally.: that is a simple grouping categories with small frequencies in an "other" value. After I pickle my model and pipeline from Sklearn GridSearch, I have model.tar.gz in my S3 bucket contaning: model.jobline, pipeline.joblib and inference.py. If I open my pipeline locally and try to transform my data, it works fine. Then, to deploy my model, I do as here.

def get_image_uri():
    # retrieve sklearn image
    image_uri = sagemaker.image_uris.retrieve(
        framework="sklearn",
        region=region,
        version="1.0-1",
        py_version="py3",
        image_scope="inference",
    )

    return image_uri


def upload_model():
    # Upload tar.gz to bucket
    # response = s3.meta.client.upload_file(
    #    "model.tar.gz", s3_bucket, f"{s3_bucket_folder}/model.tar.gz"
    # )
    logging.info(f"""Uploading  model com parĂ¢metros:
                File name {s3_file_name}
                S3 bucket: {s3_bucket}
                S3 key: {s3_key}""")
    response = s3_client.upload_file(
        s3_file_name,
        s3_bucket,
        s3_key
    )
    logging.info(f"response: {response}")
    logging.info("model artefacts uploaded")

def create_model(image_uri):
    # Step 1: Model Creation
    logging.info("Create model Model name: " + model_name)
    logging.info(model_artifacts)
    create_model_response = sgmk_client.create_model(
        ModelName=model_name,
        Containers=[
            {
                "Image": image_uri,
                "Mode": "SingleModel",
                "ModelDataUrl": model_artifacts,
                "Environment": {
                    "SAGEMAKER_SUBMIT_DIRECTORY": model_artifacts,
                    "SAGEMAKER_PROGRAM": "inference.py",
                },
            }
        ],
        ExecutionRoleArn=role,
    )
    print("Model Arn: " + create_model_response["ModelArn"])


def create_endpoint_config():
    # Step 2: EPC Creation
    endpoint_config_response = sgmk_client.create_endpoint_config(
        EndpointConfigName=endpoint_config_name,
        ProductionVariants=[
            {
                "VariantName": "variant1",
                "ModelName": model_name,
                # "InitialInstanceCount": 1,
                # "InstanceType": "ml.t2.medium",
                "ServerlessConfig": {
                    "MemorySizeInMB": 4096,
                    "MaxConcurrency": 100,
                },
            }
        ],
    )
    print(
        "Endpoint Configuration Arn: " +
        endpoint_config_response["EndpointConfigArn"]
    )


def create_endpoint():
    sgmk_client.create_endpoint(
        EndpointName=endpoint_name,
        EndpointConfigName=endpoint_config_name,
    )

It creates everything fine but not the Endpoint. On the Cloudwatch logs, I get:

 sagemaker_containers._errors.ClientError: Can't get attribute 'RecodeCategorias' on <module '__main__' from '/miniconda3/bin/gunicorn'>
 AttributeError: Can't get attribute 'RecodeCategorias' on <module '__main__' from '/miniconda3/bin/gunicorn'>

I have found similar issues like here and here, but they are using another framework that can pass dependencies to sagemaker. How can I achieve the same using boto3? I have tried adding my classes.py to the model artifacts but it did not work.

Geo
  • 177
  • 6

1 Answers1

0

I would recommend using the SageMaker Python SDK to do this and allow it to take care of the model packaging with your custom code. Create an SKLearn model using the SKLearnModel class - https://sagemaker.readthedocs.io/en/stable/frameworks/sklearn/sagemaker.sklearn.html#scikit-learn-model

And then deploy the model.

Kirit Thadaka
  • 429
  • 2
  • 5
  • I have been trying to do that but with no success. Can you give me a hint? https://stackoverflow.com/questions/75768789/deploy-a-custom-pipeline-using-sagemaker-sdk – Geo Mar 17 '23 at 15:08