TensorFlow v1.10+ Serving Custom Estimator?

Question

There are several questions regarding serving with TensorFlow e.g.

However many, that I have found, are outdated, related to the Estimator's export_outputs, or use a different API (e.g. C#).

Further, the "Basic" Serving Guide is anything but. It assumes familiarity with docker, requires use of a separate TensorFlow repo, and the guide for loading a model is limited to what follows:

Load exported model with standard TensorFlow ModelServer

Use a Docker serving image to easily load the model for serving:
docker run -p 8500:8500 \
--mount type=bind,source=/tmp/mnist,target=/models/mnist \
-e MODEL_NAME=mnist -t tensorflow/serving &

without bothering to explain what any of the arguments mean and how one would begin to adapt it to a custom estimator.

So here is a simple custom estimator

Can someone please explain to me in layman's terms, for someone who does not know what gRPC service is, how to take my exported model from the above colab (say I download the files directory as is from the colab to /tmp/colab/<contents-from-linked-colab>) and serve it (preferably without docker)

0xsx · Accepted Answer · 2018-11-24T09:06:03.733

Suppose you have a custom estimator, trained and ready to serve, like the one linked in your question. The procedure for saving and serving a trained estimator model is:

Export the Estimator to the SavedModel format.
Serve the SavedModel using the TensorFlow ModelServer.
Feed inputs to the served model and observe the prediction results.

For some use cases your trained estimator model might be better deployed and reused without serving it. Sometimes it's better to freeze a model and deploy it directly inside a program. Or sometimes you want to convert a model to the javascript or lite versions of TensorFlow. There are many ways to reuse a trained estimator without serving it. But since your question specifically asks about serving, this answer is about specifically serving with the standard ModelServer.

1. Export to SavedModel format

From the docs:

To prepare a trained Estimator for serving, you must export it in the standard SavedModel format.

For this we can use the export_saved_model function, and doing that requires that we first define a serving input receiver function. The serving input receiver function specifies and names all the tensors that become inputs to the model at serve time.

There are two kinds of serving input receiver functions and each type tells TensorFlow how inputs should be expected in step 3:

Parsing serving input receiver functions: The inputs are provided as serialized Example protobufs. This type tells TensorFlow to expect the model input to come from a string tensor that will be parsed into features. This receiver can be built using tf.estimator.export.build_parsing_serving_input_receiver_fn.
Raw serving input receiver functions: Inputs are provided directly as Tensor protobufs. This receiver can be built using tf.estimator.export.build_raw_serving_input_receiver_fn.

Your colab code is building two receiver functions that do the same thing:

serving_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(
    {'input_tensors': tf.placeholder(tf.float32, I_SHAPE(None), name="input_tensors")})

and:

def serving_input_receiver_fn():
  input_tensors = tf.placeholder(tf.float32, I_SHAPE(None), name="input_tensors")


  features = {'input_tensors' : input_tensors}
  receiver_tensors = {'input_tensors': input_tensors}
  return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)

But exporting with only one:

est.export_savedmodel('./here', serving_input_receiver_fn)

You could remove your serving_input_receiver_fn method and use the first definition:

est.export_savedmodel('./here', serving_fn)

and

exporter = tf.estimator.BestExporter(
    name="best_exporter",
    serving_input_receiver_fn=serving_fn,
    exports_to_keep=5
)

2. Serve the SavedModel

Your question said you prefer to serve your model without using docker. According to its Dockerfile, the docker image is just running the TensorFlow ModelServer binary, which you can install or build from source outside a container as described in its README, or you can copy it out from the tensorflow/serving container.

Once you have the binary installed, run it to launch a gRPC server listening on the port you want, for example 8500:

tensorflow_model_server --port=8500 --model_name=my_model --model_base_path=/path/to/export/dir

Now you are "serving" the model. If you only want to run the model without needing anything from the tensorflow_serving repo, you could instead use the saved model command line interface to run the SavedModel without a model server. It should already be installed with TensorFlow if you installed from a pre-built binary.

3. Query the running model server

The standard way of querying the model is with the gRPC service provided by the ModelServer. gRPC is an RPC framework that uses Google's protocol buffer format to define services and to communicate between hosts. It is designed to be fast, cross-platform, and scalable. It is especially convenient when all your data is already handled in protobuf format, like when dealing with TFRecord files.

There are gRPC libraries for many different languages, and you can even talk to your server with e.g. cURL, but since your question is tagged for Python I will use the grpcio and tensorflow-serving-api Python packages to execute the gRPC calls needed to predict with the served model.

Once the server is running and the Python packages are installed, you can verify the connection by querying the model's signature def metadata:

from __future__ import print_function
import grpc
from tensorflow_serving.apis import get_model_metadata_pb2
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc


with grpc.insecure_channel("localhost:8500") as channel:
  stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

  request = get_model_metadata_pb2.GetModelMetadataRequest(
      model_spec=model_pb2.ModelSpec(name="my_model"),
      metadata_field=["signature_def"])

  response = stub.GetModelMetadata(request)
  sigdef_str = response.metadata["signature_def"].value

  print ("Name:", response.model_spec.name)
  print ("Version:", response.model_spec.version.value)
  print (get_model_metadata_pb2.SignatureDefMap.FromString(sigdef_str))

With the model from your colab you would see

Name: my_model
Version: ...
signature_def {
  key: "labels"
  value {
    inputs {
      key: "input_tensors"
      value {
        name: "input_tensors:0"
        dtype: DT_FLOAT
        tensor_shape {
          dim {
            size: -1
          }
          dim {
            size: 20
          }
          dim {
            size: 7
          }
        }
      }
    }
    outputs {
      key: "output"
      value {
        name: "Sigmoid:0"
        dtype: DT_FLOAT
        tensor_shape {
          dim {
            size: -1
          }
          dim {
            size: 20
          }
          dim {
            size: 4
          }
        }
      }
    }
    method_name: "tensorflow/serving/predict"
  }
}
signature_def {
  key: "serving_default"
  value {
    inputs {
      key: "input_tensors"
      value {
        name: "input_tensors:0"
        dtype: DT_FLOAT
        tensor_shape {
          dim {
            size: -1
          }
          dim {
            size: 20
          }
          dim {
            size: 7
          }
        }
      }
    }
    outputs {
      key: "output"
      value {
        name: "Sigmoid:0"
        dtype: DT_FLOAT
        tensor_shape {
          dim {
            size: -1
          }
          dim {
            size: 20
          }
          dim {
            size: 4
          }
        }
      }
    }
    method_name: "tensorflow/serving/predict"
  }
}

So according to its signature definition the model expects a dictionary mapping an input_tensors key to a Tensor proto of floating point type and shape [-1, 20, 7] and will output a dictionary mapping an output key to a Tensor proto of floating point type and shape [-1, 20, 4]. We can create a Tensor proto in Python from a numpy array using tf.make_tensor_proto and convert back using tf.make_ndarray:

from __future__ import print_function
import grpc
import numpy as np
import tensorflow as tf
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc


# Dummy input data for batch size 3.
batch_input = np.ones((3, 20, 7), dtype="float32")


with grpc.insecure_channel("localhost:8500") as channel:
  stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)

  request = predict_pb2.PredictRequest(
      model_spec=model_pb2.ModelSpec(name="my_model"),
      inputs={"input_tensors": tf.make_tensor_proto(batch_input)})

  response = stub.Predict(request)

  batch_output = tf.make_ndarray(response.outputs["output"])
  print (batch_output.shape)

Indeed you should have a floating point array of shape (3, 20, 4) returned by your served estimator model.

For more information about how gRPC services are defined and used in Python, see the tutorial on the gRPC website. For tensorflow_serving API details see the .proto protobuf definitions.

score 1 · Answer 2 · edited Jun 20 '20 at 09:12

In your linked colab code, after you run your estimator, you should have a saved_model.pb and /variables folder in the colab's default filesystem. I will refer to where these files are as OUTPUT_PATH.

To figure out what the OUTPUT_PATH is, let us take a quick look at the relevant code from the colab here:

Estimator > define exporter

exporter = tf.estimator.BestExporter(
    name="best_exporter",
    serving_input_receiver_fn=serving_input_receiver_fn,
    exports_to_keep=5
) # this will keep the 5 best checkpoints

and

Estimator > init estimator

est = tf.estimator.Estimator(
    model_fn = model_fn,
    config = run_config, # <--- model_dir is set in here
    params = run_params,
)

Since under, Setup > Constants you define MODEL_DIR = './test' your BestExporter is saved under test/export/best_exporter/<model_num>/

So your OUTPUT_PATH is equal to that.

Download this folder to where you want to store you results. For better readability, rename <model_num> with something meaningful, e.g. test/export/best_exporter/demo_model

Serving with docker is for clarity, and with a modified docker command:

docker run -p 8500:8500 \
--mount type=bind,\
        source=$OUTPUT_PATH,\
        target=/models/$MODEL_NAME \
-e MODEL_NAME=$MODEL_NAME -t tensorflow/serving &

For those uninitiated with docker, source=$OUTPUT_PATH,target=/models/$MODEL_NAME maps the directory OUTPUT_PATH to the docker container's directory /models/$MODEL_NAME.

So in this instance you would have:

source=<path-to-downloaded-dir>/test/export/best_exporter/demo_model,\
target=/models/demo_model,\
-e MODEL_NAME=demo_model

as we are assuming that the source is the model_dir and where <path-to-downloaded-dir> is wherever you downloaded /test/export/best_exporter/demo_model.

Then following the grpc example to write the client. If you prefer RESTful API, maybe you need to change the docker port to 8501:8501 or using both together 8500-8501:8500-8501. Here is my another answer to explain this docker command.

If you don't want docker, try to install the tf-serving locally, almost the same command to run a server.

thanks for being a contributor to the site and answering my question. I made an edit to make the answer a bit more comprehensive. However, since my question is specifically in relation to serving an `Estimator` and doing so without `Docker` I would appreciate it if you would go through that process than linking to how to serve a `Keras` model, which actually just provides better context to the arguments of the docker command e.g. `MODEL_NAME` being the directory containing the `.pb` model file and `variables` directory — SumNeuron, Nov 21 '18 at 09:46
@SumNeuron After your editing, the docker command should work. Here is an example command without docker: ```tensorflow_model_server --port=XXXX --model_name=demo_model --model_base_path=/test/export/best_exporter/demo_model``` — Yiding, Nov 21 '18 at 12:23