Suppose you have a custom estimator, trained and ready to serve, like the one linked in your question. The procedure for saving and serving a trained estimator model is:
- Export the Estimator to the SavedModel format.
- Serve the SavedModel using the TensorFlow ModelServer.
- Feed inputs to the served model and observe the prediction results.
For some use cases your trained estimator model might be better deployed and reused without serving it. Sometimes it's better to freeze a model and deploy it directly inside a program. Or sometimes you want to convert a model to the javascript or lite versions of TensorFlow. There are many ways to reuse a trained estimator without serving it. But since your question specifically asks about serving, this answer is about specifically serving with the standard ModelServer.
1. Export to SavedModel format
From the docs:
To prepare a trained Estimator for serving, you must export it in the standard SavedModel format.
For this we can use the export_saved_model
function, and doing that requires that we first define a serving input receiver function. The serving input receiver function specifies and names all the tensors that become inputs to the model at serve time.
There are two kinds of serving input receiver functions and each type tells TensorFlow how inputs should be expected in step 3:
Your colab code is building two receiver functions that do the same thing:
serving_fn = tf.estimator.export.build_raw_serving_input_receiver_fn(
{'input_tensors': tf.placeholder(tf.float32, I_SHAPE(None), name="input_tensors")})
and:
def serving_input_receiver_fn():
input_tensors = tf.placeholder(tf.float32, I_SHAPE(None), name="input_tensors")
features = {'input_tensors' : input_tensors}
receiver_tensors = {'input_tensors': input_tensors}
return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)
But exporting with only one:
est.export_savedmodel('./here', serving_input_receiver_fn)
You could remove your serving_input_receiver_fn
method and use the first definition:
est.export_savedmodel('./here', serving_fn)
and
exporter = tf.estimator.BestExporter(
name="best_exporter",
serving_input_receiver_fn=serving_fn,
exports_to_keep=5
)
2. Serve the SavedModel
Your question said you prefer to serve your model without using docker. According to its Dockerfile, the docker image is just running the TensorFlow ModelServer binary, which you can install or build from source outside a container as described in its README, or you can copy it out from the tensorflow/serving
container.
Once you have the binary installed, run it to launch a gRPC server listening on the port you want, for example 8500:
tensorflow_model_server --port=8500 --model_name=my_model --model_base_path=/path/to/export/dir
Now you are "serving" the model. If you only want to run the model without needing anything from the tensorflow_serving repo, you could instead use the saved model command line interface to run the SavedModel without a model server. It should already be installed with TensorFlow if you installed from a pre-built binary.
3. Query the running model server
The standard way of querying the model is with the gRPC service provided by the ModelServer. gRPC is an RPC framework that uses Google's protocol buffer format to define services and to communicate between hosts. It is designed to be fast, cross-platform, and scalable. It is especially convenient when all your data is already handled in protobuf format, like when dealing with TFRecord files.
There are gRPC libraries for many different languages, and you can even talk to your server with e.g. cURL, but since your question is tagged for Python I will use the grpcio and tensorflow-serving-api Python packages to execute the gRPC calls needed to predict with the served model.
Once the server is running and the Python packages are installed, you can verify the connection by querying the model's signature def metadata:
from __future__ import print_function
import grpc
from tensorflow_serving.apis import get_model_metadata_pb2
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
with grpc.insecure_channel("localhost:8500") as channel:
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = get_model_metadata_pb2.GetModelMetadataRequest(
model_spec=model_pb2.ModelSpec(name="my_model"),
metadata_field=["signature_def"])
response = stub.GetModelMetadata(request)
sigdef_str = response.metadata["signature_def"].value
print ("Name:", response.model_spec.name)
print ("Version:", response.model_spec.version.value)
print (get_model_metadata_pb2.SignatureDefMap.FromString(sigdef_str))
With the model from your colab you would see
Name: my_model
Version: ...
signature_def {
key: "labels"
value {
inputs {
key: "input_tensors"
value {
name: "input_tensors:0"
dtype: DT_FLOAT
tensor_shape {
dim {
size: -1
}
dim {
size: 20
}
dim {
size: 7
}
}
}
}
outputs {
key: "output"
value {
name: "Sigmoid:0"
dtype: DT_FLOAT
tensor_shape {
dim {
size: -1
}
dim {
size: 20
}
dim {
size: 4
}
}
}
}
method_name: "tensorflow/serving/predict"
}
}
signature_def {
key: "serving_default"
value {
inputs {
key: "input_tensors"
value {
name: "input_tensors:0"
dtype: DT_FLOAT
tensor_shape {
dim {
size: -1
}
dim {
size: 20
}
dim {
size: 7
}
}
}
}
outputs {
key: "output"
value {
name: "Sigmoid:0"
dtype: DT_FLOAT
tensor_shape {
dim {
size: -1
}
dim {
size: 20
}
dim {
size: 4
}
}
}
}
method_name: "tensorflow/serving/predict"
}
}
So according to its signature definition the model expects a dictionary mapping an input_tensors
key to a Tensor proto of floating point type and shape [-1, 20, 7]
and will output a dictionary mapping an output
key to a Tensor proto of floating point type and shape [-1, 20, 4]
. We can create a Tensor proto in Python from a numpy array using tf.make_tensor_proto
and convert back using tf.make_ndarray
:
from __future__ import print_function
import grpc
import numpy as np
import tensorflow as tf
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_service_pb2_grpc
# Dummy input data for batch size 3.
batch_input = np.ones((3, 20, 7), dtype="float32")
with grpc.insecure_channel("localhost:8500") as channel:
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest(
model_spec=model_pb2.ModelSpec(name="my_model"),
inputs={"input_tensors": tf.make_tensor_proto(batch_input)})
response = stub.Predict(request)
batch_output = tf.make_ndarray(response.outputs["output"])
print (batch_output.shape)
Indeed you should have a floating point array of shape (3, 20, 4)
returned by your served estimator model.
For more information about how gRPC services are defined and used in Python, see the tutorial on the gRPC website. For tensorflow_serving
API details see the .proto
protobuf definitions.