Understanding inputs for google ai platform custom prediction routines

Question

I am following this documentation on custom prediction routines and I am trying to understand how the inputs for custom prediction routine looks like. The code to send the input looks like this:

instances = [
        [6.7, 3.1, 4.7, 1.5],
        [4.6, 3.1, 1.5, 0.2],
    ]
service = discovery.build('ml', 'v1')
name = 'projects/{}/models/{}'.format(project, model)

if version is not None:
    name += '/versions/{}'.format(version)

response = service.projects().predict(
    name=name,
    body={'instances': instances}
).execute()

and the Predictor.py at the moment is very simple. I am just trying to understand how the input looks like...

class Predictor(object):
    """An example Predictor for an AI Platform custom prediction routine."""

    def __init__(self, model):
        self._model = model

    def predict(self, instances, **kwargs):

        inputs = np.asarray(instances)
        if kwargs.get('max'):
            return np.argmax(inputs, axis=1)

        return np.sum(inputs)


    @classmethod
    def from_path(cls, model_dir):
        return cls(None)

But when I try to get the response i get the following error:

{
  "error": "Prediction failed: unknown error."
}

Furthermore it is extremely difficult to debug the code, because there is no way to step into the code or print logs... I have no idea what's going on... How the input looks like? how should i access them? This is just a simple test, but eventually I want to send images, it will be even more difficult to debug then. How will I receive them? How will I preprocess them in the preprocessor? Let's assume that the proporcessing i have done at training time looks like this

data = cv2.imread(str(img_path))
data = cv2.resize(data, (224, 224))
data = cv2.cvtColor(data, cv2.COLOR_BGR2RGB)
x = data.astype(np.float32) / 255.
return np.expand_dims(x, axis=0)

How the instances object looks like so i can construct the preprocessor accordingly? thank you in advance.

You can use `--enable-console-logging` when creating the Model to enable log output to StackDriver Logging — rhaertel80, Jun 24 '19 at 14:49

gogasca · Answer 1 · 2019-05-22T17:46:50.370

I'm builiding a new sample for Custom Prediction which may be useful for your to debug: First I write file locally via a Notebook (Colab)

%%writefile model_prediction.py

import numpy as np
import os
import pickle
import pandas as pd
import importlib

class CustomModelPrediction(object):
    _UNUSED_COLUMNS = ['fnlwgt', 'education', 'gender']
    _CSV_COLUMNS = [
        'age', 'workclass', 'fnlwgt', 'education', 'education_num',
        'marital_status', 'occupation', 'relationship', 'race', 'gender',
        'capital_gain', 'capital_loss', 'hours_per_week', 'native_country',
        'income_bracket'
    ]
    _CATEGORICAL_TYPES = {
        'workclass': pd.api.types.CategoricalDtype(categories=[
            'Federal-gov', 'Local-gov', 'Never-worked', 'Private',
            'Self-emp-inc',
            'Self-emp-not-inc', 'State-gov', 'Without-pay'
        ]),
        'marital_status': pd.api.types.CategoricalDtype(categories=[
            'Divorced', 'Married-AF-spouse', 'Married-civ-spouse',
            'Married-spouse-absent', 'Never-married', 'Separated', 'Widowed'
        ]),
        'occupation': pd.api.types.CategoricalDtype([
            'Adm-clerical', 'Armed-Forces', 'Craft-repair',
            'Exec-managerial',
            'Farming-fishing', 'Handlers-cleaners', 'Machine-op-inspct',
            'Other-service', 'Priv-house-serv', 'Prof-specialty',
            'Protective-serv',
            'Sales', 'Tech-support', 'Transport-moving'
        ]),
        'relationship': pd.api.types.CategoricalDtype(categories=[
            'Husband', 'Not-in-family', 'Other-relative', 'Own-child',
            'Unmarried',
            'Wife'
        ]),
        'race': pd.api.types.CategoricalDtype(categories=[
            'Amer-Indian-Eskimo', 'Asian-Pac-Islander', 'Black', 'Other',
            'White'
        ]),
        'native_country': pd.api.types.CategoricalDtype(categories=[
            'Cambodia', 'Canada', 'China', 'Columbia', 'Cuba',
            'Dominican-Republic',
            'Ecuador', 'El-Salvador', 'England', 'France', 'Germany',
            'Greece',
            'Guatemala', 'Haiti', 'Holand-Netherlands', 'Honduras', 'Hong',
            'Hungary',
            'India', 'Iran', 'Ireland', 'Italy', 'Jamaica', 'Japan', 'Laos',
            'Mexico',
            'Nicaragua', 'Outlying-US(Guam-USVI-etc)', 'Peru',
            'Philippines', 'Poland',
            'Portugal', 'Puerto-Rico', 'Scotland', 'South', 'Taiwan',
            'Thailand',
            'Trinadad&Tobago', 'United-States', 'Vietnam', 'Yugoslavia'
        ])
    }

    def __init__(self, model, processor):
        self._model = model
        self._processor = processor
        self._class_names = ['<=50K', '>50K']

    def _preprocess(self, instances):
        """Dataframe contains both numeric and categorical features, convert
        categorical features to numeric.

        Args:
          dataframe: A `Pandas.Dataframe` to process.
        """
        dataframe = pd.DataFrame(data=[instances], columns=self._CSV_COLUMNS[:-1])
        dataframe = dataframe.drop(columns=self._UNUSED_COLUMNS)
        # Convert integer valued (numeric) columns to floating point
        numeric_columns = dataframe.select_dtypes(['int64']).columns
        dataframe[numeric_columns] = dataframe[numeric_columns].astype(
            'float32')

        # Convert categorical columns to numeric
        cat_columns = dataframe.select_dtypes(['object']).columns
        # Keep categorical columns always using same values based on dict.
        dataframe[cat_columns] = dataframe[cat_columns].apply(
            lambda x: x.astype(self._CATEGORICAL_TYPES[x.name]))
        dataframe[cat_columns] = dataframe[cat_columns].apply(
            lambda x: x.cat.codes)
        return dataframe

    def predict(self, instances, **kwargs):
        preprocessed_data = self._preprocess(instances)
        preprocessed_inputs = self._processor.preprocess(preprocessed_data)
        outputs = self._model.predict_classes(preprocessed_inputs)
        if kwargs.get('probabilities'):
            return outputs.tolist()
        else:
            return [self._class_names[index] for index in
                    np.argmax(outputs, axis=1)]

    @classmethod
    def from_path(cls, model_dir):
        import tensorflow as tf
        model_path = os.path.join(model_dir, 'model.h5')
        model = tf.keras.models.load_model(model_path)

        preprocessor_path = os.path.join(model_dir, 'preprocessor.pkl')
        with open(preprocessor_path, 'rb') as f:
            preprocessor = pickle.load(f)

        return cls(model, preprocessor)

Once file is written I can test it like this locally before deploying the model:

from model_prediction import CustomModelPrediction
model = CustomModelPrediction.from_path('.')
instance = [25, 'Private', 226802, '11th', 7, 'Never-married', 'Machine-op-inspct', 'Own-child', 'Black', 'Male', 0, 0, 40, 'United-States']
model.predict(instance)

Other option is once you build the setup package you can also test installation locally where my_custom_code-0.1.tar.gz is the file intended to be deployed in AI Platform:

 pip install --target=/tmp/custom_lib --no-cache-dir -b /tmp/pip_builds my_custom_code-0.1.tar.gz

Also take a look at this section:

You can use the --enable-console-logging to get logs exported to your project. You may need to create a new Model.

Thank you so much for your answer. If I replicate your second and third code snippet everything works fine, because I know what I am passing. But if I deploy on google AI platform and I pass the input wrapped in a JSON (as explained here https://cloud.google.com/ml-engine/docs/v1/predict-request?#request-body) i get an error and furthermore I think I am loosing understanding of what I am actually receiving. If I go to AI platform > modes > click on my model > click on my version > test & use, and there I paste a JSON input example, I get `"error": "Internal error encountered."` . — DarioB, May 22 '19 at 20:38
Also, I have enabled the logging with `--enable-console-logging` (hopefully very useful) , but where I find the logs now? and how do I write to log? thanks — DarioB, May 22 '19 at 20:40
https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/notebooks/tensorflow/custom-prediction-routine-keras.ipynb can you take a look at this sample in the mean time? I will provide more info — gogasca, May 22 '19 at 21:32
I haven't run the notebook, but I am following all the steps in my example. — DarioB, May 23 '19 at 12:08
I have run your notebook and it and it works fine (not surprisingly). But If I change the Predictor.predict with my code (nothing else changed) it fails with the same error... Maybe you could also try to replicate the error. It is still very difficult to debug. — DarioB, May 23 '19 at 15:56
Hi DarioB, sorry for the late reply, I just open an issue to generate logs with --enable-console-logging (looks like logs are not available), with custom Prediction you still need the 'instances' key and inside of it the Object you want to predict, a dictionary for example, this object be **kwargs value, example {"instances": "{"max": [1.1, 1.0, 0.1]}"} what is your final json object? have you tried: gcloud ai-platform predict --model=${MODEL_NAME} --version=${MODEL_VERSION} --json-instances=new-data.json ? — gogasca, May 23 '19 at 21:20
Hi, thank you for your answer. I tried what you suggested. I still get the same error. My input JSON looks like this `{"instances": [ {"values": [1, 2, 3, 4], "key": 1}, {"values": [5, 6, 7, 8], "key": 2} ]}` — DarioB, May 24 '19 at 08:44
@DarioB did you ever resolve this? I still have the same issue. — hockeybro, Jun 17 '19 at 22:50
yes I solved it, in my case I actually added a real model (rather than debug code) and it actually worked. Not sure why. I will add a more elaborate answer — DarioB, Jun 18 '19 at 10:54
@DarioB what do you mean real model? What is the difference between real model and debug code for you? For me I just save the model in Tensorflow using `saver.save` and upload that. For me I cannot even get it to work by passing in the JSON input data in the Google Cloud Console under: `AI platform > model > click on my model > click on my version > test & use`. — hockeybro, Jun 18 '19 at 19:50
I do have a model running by the way and I'm still getting this error. — hockeybro, Jun 18 '19 at 21:04
I see. The problem is likely how you wrap your data into a JSON object and how you read it at the predictor end. I was expecting to have a 'Key error' sort of error, but you actually don't. Try doing the prediction locally first, and have it working this way first. I hope it helps. — DarioB, Jun 20 '19 at 08:23
I got it figured it out. It was related to a different package that didn't have data that it needed. — hockeybro, Jun 20 '19 at 22:02
Hi, how would I be able to use a specify a `prediction-class` if the class is in a sub-directory. for instance in my tar.gz I upload it has the structure (top level) setup.py, categorisation_project. Then within the categorisation_project folder there is where my `predictor.py` is sat with `MyPredictor` which needs to be referenced. I've tried using `categorisation_project.predictor.MyPredictor` which didn't work. I also have an `__init__.py` in there to make sure it works right. Is there a way I can make this work? — Callum Smyth, Jun 16 '20 at 15:21

score 1 · Accepted Answer · answered Jun 18 '19 at 11:01

It looks like that using debug code (at the time of this post) without a model do not work. I used the following code to have everything worked for my image prediction use case:

image_filename = 'your image path'
PROJECT_ID = ''
MODEL_NAME = ''
VERSION_NAME = ''

img = base64.b64encode(open(image_filename, "rb").read()).decode()
image_bite_dict = {"key": "0", "image_bytes": {"b64": img}}

instances = [
            image_bite_dict
        ]


service = googleapiclient.discovery.build('ml', 'v1')
    name = 'projects/{}/models/{}/versions/{}'.format(PROJECT_ID, MODEL_NAME, VERSION_NAME)
response = service.projects().predict(
        name=name,
        body={'instances': instances}
    ).execute()

I still can't get it to work. Can you please tell me exactly how you saved the model for deployment? Also can you add a sample of your prediction class? — hockeybro, Jun 18 '19 at 22:21

Understanding inputs for google ai platform custom prediction routines

2 Answers2

Linked