Bazel error parsing tf.estimator model

Question

I'm trying to make a *.pb model using tf.estimator and export_savedmodel(), it is a simple classifier to classify iris dataset (4 features, 3 classes):

import tensorflow as tf


num_epoch = 500
num_train = 120
num_test = 30

# 1 Define input function
def input_function(x, y, is_train):
    dict_x = {
        "thisisinput" : x,
    }

    dataset = tf.data.Dataset.from_tensor_slices((
        dict_x, y
    ))

    if is_train:
        dataset = dataset.shuffle(num_train).repeat(num_epoch).batch(num_train)
    else:   
        dataset = dataset.batch(num_test)

    return dataset


def my_serving_input_fn():
    input_data = tf.placeholder(tf.string, [None], name='input_tensors')
    receiver_tensors = {"inputs" : input_data}

    # 2 Define feature columns
    feature_columns = [
        tf.feature_column.numeric_column(key="thisisinput", shape=4),]
    features = tf.parse_example(
        input_data, 
        tf.feature_column.make_parse_example_spec(feature_columns))

    return tf.estimator.export.ServingInputReceiver(features, receiver_tensors)


def main(argv):
    tf.set_random_seed(1103) # avoiding different result of random

    # 2 Define feature columns
    feature_columns = [
        tf.feature_column.numeric_column(key="thisisinput", shape=4),]

    # 3 Define an estimator
    classifier = tf.estimator.DNNClassifier(
        feature_columns=feature_columns,
        hidden_units=[10],
        n_classes=3,
        optimizer=tf.train.GradientDescentOptimizer(0.001),
        activation_fn=tf.nn.relu,
        model_dir = 'modeliris2/'
    )

    # Train the model
    classifier.train(
        input_fn=lambda:input_function(xtrain, ytrain, True)
    )

    # Evaluate the model
    eval_result = classifier.evaluate(
        input_fn=lambda:input_function(xtest, ytest, False)
    )

    print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
    print('\nSaving models...')
    classifier.export_savedmodel("modeliris2pb", my_serving_input_fn)


if __name__ == "__main__":
    tf.logging.set_verbosity(tf.logging.INFO)
    tf.app.run(main)

Which will produce a saved_model.pb file. I've confirmed that the model works. I can also make another program which loads and runs it. Now, I want to summarize and freeze the model using Bazel. If I build Bazel and then run the following command:

bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \
--in_graph=saved_model.pb

I get the following error:

[libprotobuf ERROR external/protobuf_archive/src/google/protobuf/text_format.cc:307] Error parsing text-format tensorflow.GraphDef: 1:1: Invalid control characters encountered in text.
[libprotobuf ERROR external/protobuf_archive/src/google/protobuf/text_format.cc:307] Error parsing text-format tensorflow.GraphDef: 1:4: Interpreting non ascii codepoint 218.
[libprotobuf ERROR external/protobuf_archive/src/google/protobuf/text_format.cc:307] Error parsing text-format tensorflow.GraphDef: 1:4: Expected identifier, got: �
2018-08-14 11:50:17.759617: E tensorflow/tools/graph_transforms/summarize_graph_main.cc:320] Loading graph 'saved_model.pb' failed with Can't parse saved_model.pb as binary proto
(both text and binary parsing failed for file saved_model.pb)
2018-08-14 11:50:17.759670: E tensorflow/tools/graph_transforms/summarize_graph_main.cc:322] usage: bazel-bin/tensorflow/tools/graph_transforms/summarize_graph
Flags:
--in_graph="" string input graph file name
--print_structure=false bool whether to print the network connections of the graph

I don't understand this error. I've tried to use inception pb file and it works perfectly, so I think the problem is on how tf.estimator builds the .pb file.

Am I missing something when using export_savedmodel() or tf.estimator to create a saved model?

UPDATE

Tensorflow version: v1.9.0-0-g25c197e023 1.9.0

Result of tf_env_collect.sh:

== cat /etc/issue ===============================================
Linux rianadam 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
VERSION="18.04.1 LTS (Bionic Beaver)"
VERSION_ID="18.04"
VERSION_CODENAME=bionic

== are we in docker =============================================
No

== compiler =====================================================
c++ (Ubuntu 7.3.0-16ubuntu3) 7.3.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


== uname -a =====================================================
Linux rianadam 4.15.0-32-generic #35-Ubuntu SMP Fri Aug 10 17:58:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

== check pips ===================================================
numpy               1.15.0 
protobuf            3.6.0  
tensorflow-gpu      1.9.0  

== check for virtualenv =========================================
True

== tensorflow import ============================================
tf.VERSION = 1.9.0
tf.GIT_VERSION = v1.9.0-0-g25c197e023
tf.COMPILER_VERSION = v1.9.0-0-g25c197e023
Sanity check: array([1], dtype=int32)
/home/rian/NgodingYuk/tf_env/env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/home/rian/NgodingYuk/tf_env/env/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)

== env ==========================================================
LD_LIBRARY_PATH /usr/local/cuda/lib64:/usr/local/cuda-9.0/lib64:/usr/local/cuda/lib64:/usr/local/cuda-9.0/lib64:
DYLD_LIBRARY_PATH is unset

== nvidia-smi ===================================================
Tue Aug 21 11:13:55 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.77                 Driver Version: 390.77                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce 920M        Off  | 00000000:04:00.0 N/A |                  N/A |
| N/A   51C    P0    N/A /  N/A |    367MiB /  2004MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

== cuda libs  ===================================================
/usr/local/cuda-9.0/lib64/libcudart_static.a
/usr/local/cuda-9.0/lib64/libcudart.so.9.0.176
/usr/local/cuda-9.0/doc/man/man7/libcudart.7
/usr/local/cuda-9.0/doc/man/man7/libcudart.so.7

Have you opened an issue with tensorflow regarding this issue? — o-90, Aug 17 '18 at 14:17
I'm not really sure it's a bug or not, in Tensorflow issue's note it is said to ask here first :/ — malioboro, Aug 21 '18 at 04:09

o-90 · Accepted Answer · 2018-12-04T00:04:02.003

I ran into this same problem when I was trying to find the input/output nodes from a model that I had trained using a custom tf.Estimator. The error is from the fact that the output that you get when when using export_savedmodel is a servable (which is a, as I understand it currently, a GraphDef and other meta data) and not simply just a GraphDef.

To find the input and output nodes, you can do.

# -*- coding: utf-8 -*-

import tensorflow as tf
from tensorflow.saved_model import tag_constants

with tf.Session(graph=tf.Graph()) as sess:
    gf = tf.saved_model.loader.load(
        sess,
        [tf.saved_model.tag_constants.SERVING],
        "/path/to/saved/model/")

    nodes = gf.graph_def.node
    print([n.name + " -> " + n.op for n in nodes
           if n.op in ('Softmax', 'Placeholder')])

    # ... ['Placeholder -> Placeholder',
    #      'dnn/head/predictions/probabilities -> Softmax']

I used the canned DNNEstimator as well so the OP's nodes should be the same as mine, other users, your operation names may be different than Placeholder and Softmax depending on your classifier.

Now that you have the names of your input/output nodes, you can freeze the graph, which is addressed here

If you want to work with the values of your trained parameters, for example to quantize weights, you'll need to run tensorflow/python/tools/freeze_graph.py to convert the checkpoint values into embedded constants within the graph file itself.

#!/bin/bash

python ./freeze_graph.py \
  --in_graph="/path/to/model/saved_model.pb" \
  --input_checkpoint="/MyModel/model.ckpt-xxxx" \
  --output_graph="/home/user/pruned_saved_model_or_whatever.pb" \
  --input_saved_model_dir="/path/to/model" \
  --output_node_names="dnn/head/predictions/probabilities" \

then assuming you have graph_transforms built

#!/bin/bash

tensorflow/bazel-bin/tensorflow/tools/graph_transforms/summarize_graph \
  --in_graph=pruned_saved_model_or_whatever.pb

Output:

Found 1 possible inputs: (name=Placeholder, type=string(7), shape=[?])
No variables spotted.
Found 1 possible outputs: (name=dnn/head/predictions/probabilities, op=Softmax)
Found 256974297 (256.97M) const parameters, 0 (0) variable parameters, and 0 
control_edges
Op types used: 155 Const, 41 Identity, 32 RegexReplace, 18 Gather, 9 
StridedSlice, 9 MatMul, 6 Shape, 6 Reshape, 6 Relu, 5 ConcatV2, 4 BiasAdd, 4 
Add, 3 ExpandDims, 3 Pack, 2 NotEqual, 2 Where, 2 Select, 2 StringJoin, 2 Cast, 
2 DynamicPartition, 2 Fill, 2 Maximum, 1 Size, 1 Unique, 1 Tanh, 1 Sum, 1 
StringToHashBucketFast, 1 StringSplit, 1 Equal, 1 Squeeze, 1 Square, 1 
SparseToDense, 1 SparseSegmentSqrtN, 1 SparseFillEmptyRows, 1 Softmax, 1 
FloorDiv, 1 Rsqrt, 1 FloorMod, 1 HashTableV2, 1 LookupTableFindV2, 1 Range, 1 
Prod, 1 Placeholder, 1 ParallelDynamicStitch, 1 LookupTableSizeV2, 1 Max, 1 Mul
To use with tensorflow/tools/benchmark:benchmark_model try these arguments:
bazel run tensorflow/tools/benchmark:benchmark_model -- -- 
graph=pruned_saved_model.pb --show_flops --input_layer=Placeholder -- 
input_layer_type=string --input_layer_shape=-1 -- 
output_layer=dnn/head/predictions/probabilities

Hope this helps.

Update (2018-12-03):

A related github issue I opened that seems to be resolved in a detailed blog post which is listed at the end of the ticket.

wow, thank you, your code help me to find the input and output node, and also summarize graph works now using your step-by-step :) — malioboro, Aug 23 '18 at 08:59
I just realized my pruned_saved_model_or_whatever.pb can't be opened with `tensorflow.contrib.from_saved_model`. The error said `MetaGraphDef associated with tags 'serve' could not be found in SavedModel`, do you know why this happened? — malioboro, Aug 27 '18 at 07:15

Bazel error parsing tf.estimator model

1 Answers1

Linked