I would like to convert a TRT optimized frozen model to saved model for tensorflow serving. Are there any suggestions or sources to share?
Or are there any other ways to deploy a TRT optimized model in tensorflow serving?
Thanks.
I would like to convert a TRT optimized frozen model to saved model for tensorflow serving. Are there any suggestions or sources to share?
Or are there any other ways to deploy a TRT optimized model in tensorflow serving?
Thanks.
Assuming you have a TRT optimized model (i.e., the model is represented already in UFF) you can simply follow the steps outlined here: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#python_topics. Pay special attention to section 3.3 and 3.4, since in these sections you actually build the TRT engine and then save it to a file for later use. From that point forward, you can just re-use the serialized engine (aka. a PLAN file) to do inference.
Basically, the workflow looks something like this:
Once those steps are done (and you should have sufficient example code in the link I provided) you can just load the PLAN file and re-use it over and over again for inference operations.
If you are still stuck, there is an excellent example that is installed by default here: /usr/src/tensorrt/samples/python/end_to_end_tensorflow_mnist
. You should be able to use that example to see how to get to the UFF format. Then you can just combine that with the example code found in the link I provided.