4

The models in the TF Objection Detection Zoo have meta+ckpt file, Frozen.pb file, and Saved_model file.

I tried to use the meta+ckpt file to train further and also to extract some weights for particular tensors for research purpose. I see that the models don't have any trainable variables.

vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
print(vars)

The above snippet gives an [] list. I also tried using the following.

vars = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES)
print(vars)

I again get an [] list.

How is this possible ? is the model stripped off the variables ? or are the tf.Variable(trainable=False) ? Where can I get meta+ckpt file with valid trainable variables. I specifically looking at SSD+mobilnet models

UPDATE:

Following is the code snippet I'm using for restoring.It inside a class since I am making a custom tool for some application.

def _importer(self):
    sess = tf.InteractiveSession()
    with sess.as_default():
        reader = tf.train.import_meta_graph(self.metafile,
                                            clear_devices=True)
        reader.restore(sess, self.ckptfile)

def _read_graph(self):
    sess = tf.get_default_session()
    with sess.as_default():
        vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
        print(vars)

UPDATE 2:

I also tried with the following snippet. Simple restoring style.

model_dir = 'ssd_mobilenet_v2/'

meta = glob.glob(model_dir+"*.meta")[0]
ckpt = meta.replace('.meta','').strip()

sess = tf.InteractiveSession()
graph = tf.Graph()
with graph.as_default():
    with tf.Session() as sess:
        reader = tf.train.import_meta_graph(meta,clear_devices=True)
        reader.restore(sess,ckpt)

        vari = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
        for var in vari:
            print(var.name,"\n")

Th above code snippet also gives [] variable list

lamo_738
  • 440
  • 1
  • 5
  • 15
  • It is possible that the graphs have replaced variables with constants, for improved performance. However, I don't know where to find the trainable models. – Addy Jun 11 '19 at 15:46
  • @Addy I understand the `.Pb` files would have variables converted to constants for inference performance. But the meta+ckpt are meant to be for accessing the variables for further training. Correct me if i'm wrong. Because, these are the same `chkpt` that can use to restore and finetune models in the tf pipeline config files. – lamo_738 Jun 11 '19 at 15:47
  • from what I've seen in the link you provided it seems to me as well that the `.ckpt` files should contain trainable variables. So the only thing that comes up to me without seeing the code is that you might not be loading the checkpoint correctly, or perhaps it's not loaded to the default graph. If you provide a sample of your code that contains the restoring itself, more can be said. – Addy Jun 11 '19 at 15:56
  • @Addy I have updated with the minimal code snippet i do for importing. This code snippet is working perfectly tested for numerous other models which are `meta+ckpt`. Now i'm thinking whether the detection zoo models are some different version of meta+ckpt probably ? Not Clear yet – lamo_738 Jun 11 '19 at 17:27

1 Answers1

4

After a little bit of research, the final answer to your question is No, they don't. It is quite obvious until you realize that the variables directory in saved_model is empty.

The checkpoint file provided by the object detection model zoo contains following files:

.
|-- checkpoint
|-- frozen_inference_graph.pb
|-- model.ckpt.data-00000-of-00001
|-- model.ckpt.index
|-- model.ckpt.meta
|-- pipeline.config
`-- saved_model
    |-- saved_model.pb
    `-- variables

The pipeline.config is the config file for the model saved, the frozen_inference_graph.pb is for off-the-shelf inference. Notice that checkpoint, model.ckpt.data-00000-of-00001, model.ckpt.meta and model.ckpt.index all correspond to the checkpoint. (Here you can find a nice explanation)

So when you want to get the trainable variables, the only thing useful is the saved_model directory.

Use SavedModel to save and load your model—variables, the graph, and the graph's metadata. This is a language-neutral, recoverable, hermetic serialization format that enables higher-level systems and tools to produce, consume, and transform TensorFlow models.

To recover the SavedModel you can use the api tf.saved_model.loader.load(), and this api contain one argument called tags, which specify the type of MetaGraphDef. So if you want to get the trainable variables, you need to specify tag_constants.TRAINING when calling the api.

I tried to call this api to recover the variables but instead it gave me the error that says

MetaGraphDef associated with tags 'train' could not be found in SavedModel. To inspect available tag-sets in the SavedModel, please use the SavedModel CLI: saved_model_cli

So I did this saved_model_cli command to inspect all tags available in the SavedModel.

#from directory saved_model
saved_model_cli show --dir . --all

and the output is

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
...
signature_def['serving_default']:
  ...

So there is no tag train but only serve within this SavedModel. The SavedModel here therefore is only used for tensorflow serving. This means when these files when created, were not specified with tag training, no training variables can be recovered from these files.

P.S.: the following code is what i used for restoring the SavedModel. When setting tag_constants.TRAINING, the loading cannot be completed but when setting tag_constants.SERVING, the loading is successful but the variables are empty.

graph = tf.Graph()
with tf.Session(graph=graph) as sess:
  tf.saved_model.loader.load(sess, [tf.saved_model.tag_constants.TRAINING], export_dir)
  variables = graph.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)
  print(variables)

P.P.S: I found the script for creating the SavedModel here. It can be seen that indeed there was not train tag when creating the SavedModel.

Danny Fang
  • 3,843
  • 1
  • 19
  • 25
  • Thanks for the detailed explanation. So in the conclusion you are saying that, Tf has nowhere given a link to a model which has trainable-variables but only inference-ready models ? Also, adding to the pipeline config you mentioned, is it possible for me to run the pipeline with the above `checkpoint` and specify a saving method in the config which saves a trainable model ? – lamo_738 Jun 11 '19 at 20:55
  • But I have to say there is a way you can obtain all the trainable variables. Just start a training session by running `model_main.py`, there will be a graph created and you can have the trainable variables (using tensorboard you can checkpoint all variables that are linked to operation `train`). But for the pretrained model provided in the zoo, it seemed you could not obtain trainable variables from it. – Danny Fang Jun 11 '19 at 21:10
  • Okay. Thats great. I will definitely try it out. One last question, should I bazel build the `tensorflow/models` repo separately ? or is it just cloning can run directly ? It would be helpful if you direct me where is `model_main.py`. – lamo_738 Jun 11 '19 at 21:16
  • Just clone the repo. But you have to install the api first to be able to run it. Here is the guide for the installation of the api and many other useful documentations. https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md and https://github.com/tensorflow/models/tree/master/research/object_detection – Danny Fang Jun 11 '19 at 21:19
  • Sorry I was not very clear last night. But I don't think you can specify a saving method in the config file to get a model that has trainable variables, you will have to modify the exporting graph script to achieve that. – Danny Fang Jun 12 '19 at 07:23
  • Yes, Thats what even I thought in first place. I will change the script a bit to change the saving tags and also might throw in a simple_save in the script. – lamo_738 Jun 12 '19 at 14:24
  • @danyfang I'm not quite sure I understand your conclusion. You say the savedmodels were exported for inference only, but is it possible to re-export it for training by using the checkpoint and config ? – Mat Oct 03 '19 at 14:00
  • 1
    "After a little bit of research, the final answer to your question is YES." The word should be NO, according to the overall description!(在英语里表示同意别人的否定意见直接用 NO 吧?) – ZDL-so May 20 '20 at 07:07