0

I have an on-premise K8s cluster that is connected to Azure via Arc. Also have a machine learning workspace that has this as an attached cluster. In my workspace, I have a model created. I created an endpoint in the workspace. Now I want to deploy to this endpoint to provision a scoring endpoint on-premise. But the deployment doesnt go anywhere.

Below is the code for endpoint creation which succeeded

from azure.ai.ml.entities import (
    KubernetesOnlineEndpoint,
    KubernetesOnlineDeployment,    
    Model,
    Environment,
)

online_endpoint_name = "mnist-test-endpoint"

# create an online endpoint
endpoint = KubernetesOnlineEndpoint(
    name=online_endpoint_name,
    compute="onpremk8scompute",
    description="mnist_test_endpoint",
    auth_mode="key",
)

endpoint = ml_client.online_endpoints.begin_create_or_update(endpoint).result()

print(f"Endpoint {endpoint.name} provisioning state: {endpoint.provisioning_state}")

And the code for deployment to the end point

from azure.ai.ml.entities import (
    KubernetesOnlineEndpoint,
    KubernetesOnlineDeployment,    
    Model,
    Environment,
    CodeConfiguration,
    ResourceRequirementsSettings,
    ResourceSettings
)

model = ml_client.models.get("mnist_model_test", version=1)

model

deployment = KubernetesOnlineDeployment(
    name="mnist-deployment-v1",
    endpoint_name=online_endpoint_name,
    model=model,
    environment=Environment(
        conda_file="08_conda_env.yml",
        image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04",
    ),
    code_configuration=CodeConfiguration(
        code="./inference_script", scoring_script="score.py"
    ),
    resources=ResourceRequirementsSettings(
        requests=ResourceSettings(
            cpu="100m",
            memory="0.5Gi",
        ),
    ),
    instance_count=1,
)

ml_client.begin_create_or_update(deployment).result()

This does not work. I also dont find any resources in the on-premise K8s namespace designated for this.

Ideas?

JakeUT
  • 359
  • 1
  • 4
  • 16

1 Answers1

0

Azure ML scoring script deployment to azure arc K8s cluster not working

This might be due to the Kubernetes cluster being improperly set up to function with Azure Machine Learning.

Check the endpoint logs using the below commands

az ml endpoint logs -n <endpoint-name> -g <resource-group-name>

The above command will take a look at the deployment's logs to see whether there are errors or issues. restricting it from succeeding.

Also, check the status of deployment using the below command:

az ml endpoint deployment show -n <deployment-name> -e <endpoint-name> -g <resource-group-name>

As a workaround, I followed this Document to create an online endpoint and deployments through python sdk for Kubernetes deployment.

You can use this Python code to attach an arc.

Code:

from azure.ai.ml import load_compute

# for arc connected cluster, the resource_id should be something like '/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ContainerService/connectedClusters/<CLUSTER_NAME>''
compute_params = [
    {"name": "<COMPUTE_NAME>"},
    {"type": "kubernetes"},
    {
        "resource_id": "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>/providers/Microsoft.ContainerService/managedClusters/<CLUSTER_NAME>"
    },
]
k8s_compute = load_compute(source=None, params_override=compute_params)

ml_client.begin_create_or_update(k8s_compute).result()

You can use the below code to create an online endpoint and deployments.

Code:

    from azure.ai.ml import MLClient
    from azure.ai.ml.entities import (
        KubernetesOnlineEndpoint,
        KubernetesOnlineDeployment,
        Model,
        CodeConfiguration,
        Environment,
    )
    from azure.ai.ml.entities._deployment.resource_requirements_settings import (
        ResourceRequirementsSettings,
    )
    from azure.ai.ml.entities._deployment.container_resource_settings import (
        ResourceSettings,
    )
    from azure.identity import DefaultAzureCredential

    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace = "<AML_WORKSPACE_NAME>"
    credential = DefaultAzureCredential()
    ml_client = MLClient(
        credential,
        subscription_id=subscription_id,
        resource_group_name=resource_group,
        workspace_name=workspace_name,
    )
    online_endpoint_name ="demo-endpoint326"
    
    #create an online endpoint
    endpoint =  KubernetesOnlineEndpoint(
        name=online_endpoint_name,
        compute="compute_name"
        description="this is a sample online endpoint",
        auth_mode="key",
        tags={"foo": "bar"},
    )
    ml_client.online_endpoints.begin_create_or_update(endpoint).result()


    model = Model(...endpoints\\online\\model-1\\model\\sklearn_regression_model.pkl")
env = Environment(
    conda_file="...endpoints\\online\\model-1\\environment\\conda.yml",
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest",
)
 blue_deployment = KubernetesOnlineDeployment(
        name="kuberntes326",
        endpoint_name=online_endpoint_name,
        model=model,
        environment=env,
        code_configuration=CodeConfiguration(
             code="...endpoints\\online\\model-1\\onlinescoring", scoring_script="score.py"
        ),
        instance_count=1,
        resources=ResourceRequirementsSettings(
            requests=ResourceSettings(
                cpu="100m",
                memory="0.5Gi",
            ),
        )
    )

ml_client.online_deployments.begin_create_or_update(
    deployment=blue_deployment)

Output:

Check: endpoint demo-endpoints326 exists
Uploading onlinescoring (0.0 MBs): 100%|########################################################| 4999/4999 [00:01<00:00, 4969.10it/s] 


Uploading sklearn_regression_model.pkl (< 1 MB): 100%|###############################################| 756/756 [00:00<00:00, 2.88kB/s] 

enter image description here

Portal:

enter image description here

Reference: Introduction to Kubernetes compute target in Azure Machine Learning - Azure Machine Learning | Microsoft Learn

Venkatesan
  • 3,748
  • 1
  • 3
  • 15