Restoring a model trained with tf.estimator and feeding input through feed_dict

Question

I trained a resnet with tf.estimator, the model was saved during the training process. The saved files consist of .data, .index and .meta. I'd like to load this model back and get predictions for new images. The data was fed to the model during training using tf.data.Dataset. I have closely followed the resnet implementation given here.

I would like to restore the model and feed inputs to the nodes using a feed_dict.

First attempt

  #rebuild input pipeline
  images, labels = input_fn(data_dir, batch_size=32, num_epochs=1)

  #rebuild graph
  prediction= imagenet_model_fn(images,labels,{'batch_size':32,'data_format':'channels_first','resnet_size':18},mode = tf.estimator.ModeKeys.EVAL).predictions 

  saver  = tf.train.Saver()
  with tf.Session() as sess:
    ckpt = tf.train.get_checkpoint_state(r'./model')
    saver.restore(sess, ckpt.model_checkpoint_path)
    while True:
    try:
        pred,im= sess.run([prediction,images])
        print(pred)
    except tf.errors.OutOfRangeError:
      break

I fed a dataset which was evaluated on the same model using classifier.evaluate, but the above method gives wrong predictions. The model gives same class and probability, 1.0, for all images.

Second attempt

saver = tf.train.import_meta_graph(r'.\resnet\model\model-3220.meta')
sess = tf.Session()
saver.restore(sess,tf.train.latest_checkpoint(r'.\resnet\model'))
graph = tf.get_default_graph()
inputImage = graph.get_tensor_by_name('image:0')
logits= graph.get_tensor_by_name('logits:0')

#Get prediction
print(sess.run(logits,feed_dict={inputImage:newimage}))

This also gives wrong predictions compared to classifier.evaluate. I can even run sess.run(logits) without a feed_dict!

Third attempt

def serving_input_fn():
  receiver_tensor = {'feature': tf.placeholder(shape=[None, 384, 256, 3], dtype=tf.float32)}
  features = {'feature': receiver_tensor['images']}
return tf.estimator.export.ServingInputReceiver(features, receiver_tensor)

It fails with

Traceback (most recent call last):
  File "imagenet_main.py", line 213, in <module>
    tf.app.run(argv=[sys.argv[0]] + unparsed)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 124, in run
    _sys.exit(main(argv))
  File "imagenet_main.py", line 204, in main
    resnet.resnet_main(FLAGS, imagenet_model_fn, input_fn)
  File "C:\Users\Photogauge\Desktop\iprings_images\models-master\models-master\official\resnet\resnet.py", line 527, in resnet_main
    classifier.export_savedmodel(export_dir_base=r"C:\Users\Photogauge\Desktop\iprings_images\models-master\models-master\official\resnet\export", serving_input_receiver_fn=serving_input_fn)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 528, in export_savedmodel
    config=self.config)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 725, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "imagenet_main.py", line 200, in imagenet_model_fn
    loss_filter_fn=None)
  File "C:\Users\Photogauge\Desktop\iprings_images\models-master\models-master\official\resnet\resnet.py", line 433, in resnet_model_fn
    tf.argmax(labels, axis=1), predictions['classes'])
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\util\deprecation.py", line 316, in new_func
    return func(*args, **kwargs)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\math_ops.py", line 208, in argmax
    return gen_math_ops.arg_max(input, axis, name=name, output_type=output_type)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 508, in arg_max
    name=name)
  File "C:\Users\Photogauge\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 528, in _apply_op_helper
    (input_name, err))
ValueError: Tried to convert 'input' to a tensor and failed. Error: None values not supported.

The code I used for training and building the model is as below:

Specification for parsing the dataset:

def parse_record(raw_record, is_training):
  keys_to_features = {
      'image/encoded':
          tf.FixedLenFeature((), tf.string, default_value=''),
      'image/class/label':
          tf.FixedLenFeature([], dtype=tf.int64, default_value=-1),
  }
  parsed = tf.parse_single_example(raw_record, keys_to_features)
  image = tf.image.decode_image(
      tf.reshape(parsed['image/encoded'], shape=[]),3)
  image = tf.image.convert_image_dtype(image, dtype=tf.float32)
  label = tf.cast(
      tf.reshape(parsed['image/class/label'], shape=[]),
      dtype=tf.int32)
  return image, tf.one_hot(label,2)

The following function parses the data and creates batches for training

def input_fn(is_training, data_dir, batch_size, num_epochs=1):
  dataset = tf.data.Dataset.from_tensor_slices(
      filenames(is_training, data_dir))
  if is_training:
     dataset = dataset.shuffle(buffer_size=_FILE_SHUFFLE_BUFFER)
  dataset = dataset.flat_map(tf.data.TFRecordDataset)
  dataset = dataset.map(lambda value: parse_record(value, is_training),
                        num_parallel_calls=5)
  dataset = dataset.prefetch(batch_size)
  if is_training:
      dataset = dataset.shuffle(buffer_size=_SHUFFLE_BUFFER)
  dataset = dataset.repeat(num_epochs)
  dataset = dataset.batch(batch_size)

  iterator = dataset.make_one_shot_iterator()
  images, labels = iterator.get_next()
  return images, labels

A classifier is created as below for training on train set and evaluation on validation set

classifier = tf.estimator.Estimator(
      model_fn=model_function, model_dir=flags.model_dir, config=run_config,
      params={
          'resnet_size': flags.resnet_size,
          'data_format': flags.data_format,
          'batch_size': flags.batch_size,
      })

    #Training cycle
     classifier.train(
         input_fn=lambda: input_function(
             training_phase=True, flags.data_dir, flags.batch_size, flags.epochs_per_eval),
         hooks=[logging_hook])
    # Evaluate the model 
    eval_results = classifier.evaluate(input_fn=lambda: input_function(
        training_phase=False, flags.data_dir, flags.batch_size))

This is how I tried to load and get predictions from the model.

What is the right way to restore a saved model and perform inference on it. I want to feed images directly without using tf.data.Dataset.

Update

The value of ckpt is after running ckpt = tf.train.get_checkpoint_state(r'./model') is

model_checkpoint_path: "./model\model.ckpt-5980" all_model_checkpoint_paths: "./model\model.ckpt-5060" all_model_checkpoint_paths: "./model\model.ckpt-5061" all_model_checkpoint_paths: "./model\model.ckpt-5520" all_model_checkpoint_paths: "./model\model.ckpt-5521" all_model_checkpoint_paths: "./model\model.ckpt-5980"
The output is same when I try `saver.restore(sess, tf.train.latest_checkpoint(r'.\resnet\model'))
Passing in full path to saver.restore gives the same output In all cases the same model, model.ckpt-5980 was restored

About the first attempt: What is the value of `ckpt` after `ckpt = tf.train.get_checkpoint_state(r'./model')`? What happens if you instead restore via `saver.restore(sess, tf.train.latest_checkpoint(r'.\resnet\model'))`? And what if instead you pass a string with the full path of the checkpoint instead to `restore`? — GPhilo, Feb 09 '18 at 10:56
Thanks. Since Tensorflow is not raining any errors during restoration and your update shows the checkpoint file you're loading should be the right one, my next guess is the data in the checkpoint itself. I'm looking at your input pipeline for the training and I see no `shuffle`. Is your data shuffled somewhere? — GPhilo, Feb 09 '18 at 11:55
Another interesting thing is, your checkpoint has a really low number. `5980` is the value of `global_step` when the checkpoint was saved, so -unless I'm missing something- your network saw a total of `32*5980 = 191360` samples.. that's hardly enough to train resnet. Are you *sure* the checkpoint actually contains weights of a trained network? — GPhilo, Feb 09 '18 at 12:01
@GPhilo Yes, data is shuffled during training as `if is_training: dataset = dataset.shuffle(buffer_size=_FILE_SHUFFLE_BUFFER)`. Inside the classifier, there is a method called `classifier.evaluate`, this usually takes the validation set as the input. I tried the test set on this method after the model was trained and saved, i.e. I commented out `classifier.train` and pointed the validation dataset directory to test directory. I could see that the same model, `model.ckpt-5980` was being restored and I got `84%` accuracy. — Effective_cellist, Feb 09 '18 at 12:06
Sorry, I did not see that `shuffle`. The thing is, up there you're only shuffling the filenames, not the actual samples in the files. Unless you have one sample per tfrecord file (which I assume is not the case), all the samples in each file will be in the same order at every epoch. — GPhilo, Feb 09 '18 at 12:12
@GPhilo Sorry, I missed out a line of code. There is one more shuffle after the records are parsed. — Effective_cellist, Feb 09 '18 at 12:15
Here is detailed example with latest tensorflow version 1.7 https://stackoverflow.com/a/52222383/5904928 — Aaditya Ura, Sep 07 '18 at 12:20

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

Note: This answer will evolve as soon as more information comes available. I'm not sure this is the most appropriate way to do it, but it feels better than using just comments. Feel free to drop a comment to the answer if this is inapproriate.

About your second attempt:

I don't have much experience with the import_meta_graph method, but if sess.run(logits) runs without complaining, I think the meta graph contains also your input pipeline.

A quick test I just made confirms that the pipeline is indeed restored too when you load the metagraph. This means, you're not actually passing in anything via feed_dict, because the input is taken from the Dataset-based pipeline that was used when the checkpoint was taken. From my research, I can't find a way to provide a different input function to the graph.

About the first attempt:

You code looks right to me, so my suspicion is that the checkpoint file that gets loaded is somehow wrong. I asked some clarifications in a comment, I'll update this part as soon as that info is available

How did you test if the input pipeline is restored when metagraph is loaded? — Effective_cellist, Feb 09 '18 at 21:22
I get the operations in the graph via `tf.get_default_graph().get_operations()` and I then inspect the list. The first items are Dataset-related operations — GPhilo, Feb 10 '18 at 11:22

score 1 · Answer 2 · answered Feb 10 '18 at 14:58

1

If you have model pb or pb.txt then inference is easy. Using predictor module, we can do an inference. Check out here for more information. For image data it will be something to similar to below example. Hope this helps !!

Example code:

import numpy as np
import matplotlib.pyplot as plt

def extract_data(index=0, filepath='data/cifar-10-batches-bin/data_batch_5.bin'):
    bytestream = open(filepath, mode='rb')
    label_bytes_length = 1
    image_bytes_length = (32 ** 2) * 3
    record_bytes_length = label_bytes_length + image_bytes_length
    bytestream.seek(record_bytes_length * index, 0)
    label_bytes = bytestream.read(label_bytes_length)
    image_bytes = bytestream.read(image_bytes_length)
    label = np.frombuffer(label_bytes, dtype=np.uint8)  
    image = np.frombuffer(image_bytes, dtype=np.uint8)
    image = np.reshape(image, [3, 32, 32])
    image = np.transpose(image, [1, 2, 0])
    image = image.astype(np.float32)
   result = {
     'image': image,
     'label': label,
   }
   bytestream.close()
   return result


    predictor_fn = tf.contrib.predictor.from_saved_model(
  export_dir = saved_model_dir, signature_def_key='predictions')
    N = 1000
    labels = []
    images = []
    for i in range(N):
       result = extract_data(i)
       images.append(result['image'])
       labels.append(result['label'][0])
    output = predictor_fn(
      {
        'images': images,
      }
    )

answered Feb 10 '18 at 14:58

Kishore Karunakaran

598
1
6
16

I don't have a `.pb` or `.pbtxt` file. Like I mentioned in the question I have `.meta`. `.index`, and `.data` files, and also the checkpoint. In order to use the `predictor` function, I have to export the model as `estimator.export_savedmodel(my_export_dir, serving_input_fn)`. It's the `serving_input_fn` I can't figure out – Effective_cellist Feb 10 '18 at 18:27
Here is an example how I would do. Let me know if it works for you. https://gist.github.com/kishorenayar/b9991b673b06b449703c07e3626982e5 – Kishore Karunakaran Feb 11 '18 at 03:24
Thanks Kishore. I've updated my question. Please take a look the the section "Third attempt". – Effective_cellist Feb 11 '18 at 06:55
I have opened another question related to [this](https://stackoverflow.com/questions/48726633/piping-the-output-of-dataset-iterator-through-a-placeholder) – Effective_cellist Feb 11 '18 at 06:58
@dpk, Are you using tf-record as a input dataset to the estimator model ? – Kishore Karunakaran Feb 12 '18 at 08:00
The dataset is stored as a `tfrecord` file. The input pipeline is built with `tf.data.Dataset` – Effective_cellist Feb 12 '18 at 08:52
Please find it [here](https://github.com/deepaksuresh/models/blob/master/official/resnet/resnet.py) – Effective_cellist Feb 12 '18 at 16:20

Restoring a model trained with tf.estimator and feeding input through feed_dict

2 Answers2

About your second attempt:

About the first attempt:

Linked