Tensorflow Serving cannot find GPU

Question

I'm trying to serve a trained keras model into tensorflow serving. the export part was alright, I use

with tf.device('/gpu:0'):

before load the model. but when I try to serve it, the GPU device cannot be found.

TF_CPP_MIN_VLOG_LEVEL=1 CUDA_VISIBLE_DEVICES=2 /home/diana/serving/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --port=9002 --model_name=ex_61 --model_base_path=/home/diana/code/Tf_Serving/ex_61_servable
2017-09-21 13:37:42.659616: I tensorflow_serving/model_servers/main.cc:147] Building single TensorFlow model file config:  model_name: ex_61 model_base_path: /home/diana/code/Tf_Serving/ex_61_servable
2017-09-21 13:37:42.659869: I tensorflow_serving/model_servers/server_core.cc:441] Adding/updating models.
2017-09-21 13:37:42.659905: I tensorflow_serving/model_servers/server_core.cc:492]  (Re-)adding model: ex_61
2017-09-21 13:37:42.660097: I tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:390] File-system polling update: Servable:{name: ex_61 version: 1}; Servable path: /home/diana/code/Tf_Serving/ex_61_servable/1; Polling frequency: 1
2017-09-21 13:37:42.661475: I tensorflow_serving/core/aspired_versions_manager.cc:235] Enqueueing aspired versions request: {name: ex_61 version: 1}
2017-09-21 13:37:42.760051: I tensorflow_serving/core/aspired_versions_manager.cc:245] Processing aspired versions request: {name: ex_61 version: 1}
2017-09-21 13:37:42.760097: I tensorflow_serving/core/aspired_versions_manager.cc:287] Adding {name: ex_61 version: 1} to BasicManager
2017-09-21 13:37:42.760116: I tensorflow_serving/core/basic_manager.cc:315] Request to start managing servable {name: ex_61 version: 1}
2017-09-21 13:37:42.760158: I tensorflow_serving/core/availability_preserving_policy.cc:77] AvailabilityPreservingPolicy requesting to load servable {name: ex_61 version: 1}
2017-09-21 13:37:42.760177: I tensorflow_serving/core/aspired_versions_manager.cc:341] Taking action: { action: 0 id: {name: ex_61 version: 1} }
2017-09-21 13:37:42.760190: I tensorflow_serving/core/basic_manager.cc:479] Request to load servable {name: ex_61 version: 1}
2017-09-21 13:37:42.760208: I tensorflow_serving/core/loader_harness.cc:57] Load requested for servable version {name: ex_61 version: 1}
2017-09-21 13:37:42.760450: I tensorflow_serving/core/basic_manager.cc:705] Successfully reserved resources to load servable {name: ex_61 version: 1}
2017-09-21 13:37:42.760487: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: ex_61 version: 1}
2017-09-21 13:37:42.760505: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: ex_61 version: 1}
2017-09-21 13:37:42.760535: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /home/diana/code/Tf_Serving/ex_61_servable/1
2017-09-21 13:37:42.760559: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:236] Loading SavedModel from: /home/diana/code/Tf_Serving/ex_61_servable/1
2017-09-21 13:37:42.791626: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2017-09-21 13:37:42.791667: I external/org_tensorflow/tensorflow/core/common_runtime/local_device.cc:40] Local device intra op parallelism threads: 12
2017-09-21 13:37:42.792050: I external/org_tensorflow/tensorflow/core/common_runtime/direct_session.cc:86] Direct session inter op parallelism threads: 12
2017-09-21 13:37:42.823835: I external/org_tensorflow/tensorflow/core/common_runtime/optimization_registry.cc:37] Running optimization phase 0
2017-09-21 13:37:42.831347: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:284] Loading SavedModel: fail. Took 70656 microseconds.
2017-09-21 13:37:42.831414: E tensorflow_serving/util/retrier.cc:38] Loading servable: {name: ex_61 version: 1} failed: Invalid argument: Cannot assign a device for operation 'save/StringJoin': Operation was explicitly assigned to /device:GPU:0 but available devices are [ /job:localhost/replica:0/task:0/cpu:0 ]. Make sure the device specification refers to a valid device.
     [[Node: save/StringJoin = StringJoin[N=2, _output_shapes=[[]], separator="", _device="/device:GPU:0"](save/Const, save/StringJoin/inputs_1)]]

note the last line, tensorflow serving cannot find GPU device. how to solve this problem? thank u

my tensorflow environment:

(tensor27) diana@brick:~/code/Tf_Serving$ python
Python 2.7.13 |Continuum Analytics, Inc.| (default, Dec 20 2016, 23:09:15) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import tensorflow
>>> tensorflow.__path__
['/home/diana/anaconda3/envs/tensor27/lib/python2.7/site-packages/tensorflow']
>>> tensorflow.Session().run
2017-09-21 13:41:10.177715: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-21 13:41:10.177766: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-21 13:41:10.177786: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-09-21 13:41:10.177800: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-09-21 13:41:10.177811: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-09-21 13:41:10.528596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:05:00.0
Total memory: 7.92GiB
Free memory: 7.81GiB
2017-09-21 13:41:10.528628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-09-21 13:41:10.528634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-09-21 13:41:10.528650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:05:00.0)

Configuring GPUs rarely goes smooth, I advise you to consider starting using docker with the nvidia-docker-plugin, to free yourself of the configuration tasks and make your projects portable. — bluesummers, Sep 21 '17 at 05:53
thanks, I actually have that docker built already, but tired of docker copy all the time to interact with local environment. start to try it right away — Diana Yu, Sep 21 '17 at 07:52
Not the question you are asking but you can mount volumes in docker so that you don't have to copy. Use the -v flag when running the container. — Matt S, Feb 28 '19 at 16:46
Are you sure you are using a version of tensorflow model server built with gpu support? — Matt S, Feb 28 '19 at 16:52

score 0 · Answer 1 · answered Nov 07 '19 at 08:52

If your problem is still not resolved or for other people experiencing the same issue, try setting up your docker as described here.

GRPC client code for connecting to the tfserving server here

Disclaimer: Above discussions are published by me. Sharing them here, as they have been well tested.

Tensorflow Serving cannot find GPU

1 Answers1