8

I am working with Tensorflow 1.4.

I created a custom tf.estimator in order to do classification, like this:

def model_fn():
    # Some operations here
    [...]

    return tf.estimator.EstimatorSpec(mode=mode,
                           predictions={"Preds": predictions},
                           loss=cost,
                           train_op=loss,
                           eval_metric_ops=eval_metric_ops,
                           training_hooks=[summary_hook])

my_estimator = tf.estimator.Estimator(model_fn=model_fn, 
                       params=model_params,
                       model_dir='/my/directory')

I can train it easily:

input_fn = create_train_input_fn(path=train_files)
my_estimator.train(input_fn=input_fn)

where input_fn is a function that reads data from tfrecords files, with the tf.data.Dataset API.

As I am reading from tfrecords files, I don't have labels in memory when I am making predictions.

My question is, how can I have predictions AND labels returned, either by the predict() method or the evaluate() method?

It seems there is no way to have both. predict() does not have access (?) to labels, and it is not possible to access the predictions dictionary with the evaluate() method.

  • As you correctly noted, you don't have labels in predict (because that's for inference, i.e., you use that to classify new data). The problem is that the `evaluate` call won't return the labels, because it runs a loop over all your dataset and computes aggregated metrics, which are then returned. If you want to have for each batch both the prediction and the labels, you'll have to load the model from the checkpoint, make a `tf.Session()` and loop `sess.run([predictions, labels])` calls until your data is exausted. – GPhilo Nov 17 '17 at 12:07
  • Thanks for your comment. That's what I was afraid of. The all point of using tf.estimator was to avoid tf.Session() and building graph. Too bad there is no other way to do it. – Benjamin Larrousse Nov 20 '17 at 08:45
  • The good thing is, however, that you don't need to explicitly build the graph. All you need is a checkpoint of your trained estimator (or a frozen/exported version of it) which you have anyway after the training. With that, you can import everything directly (meaning, both the graph architecture and the learned weights) without the need to build the graph. then you can `tf.get_default_graph().get_tensor_by_name('logits')` (adapt as needed) and run your graph – GPhilo Nov 20 '17 at 08:48
  • It seems silly to ask, but how can I retrieve _labels_ this way ? I can add _logits_ in the checkpoint file (e.g. with tf.add_to_collection) but it's not working with _labels_ . – Benjamin Larrousse Nov 28 '17 at 17:40
  • @GPhilo do you have an idea ? What am I missing ? – Benjamin Larrousse Nov 30 '17 at 09:09
  • I'm not sure I understand what you want to do.. – GPhilo Nov 30 '17 at 09:10
  • Well after training my model with tf.Estimator, I want to export 2 lists, one with _labels_ and one with _predictions_, in order to make analysis (like calibration curve). But as we said I have to make a `tf.Session()`, but I can not do `sess.run([predictions, labels])` because labels are read on the fly from tfrecords with the `tf.data.Dataset` API, and it seems I can not save a Tensor which save these label values and retrieve it through my checkpoint. – Benjamin Larrousse Nov 30 '17 at 13:22

1 Answers1

11

After you finished your training, in '/my/directory' you have a bunch of checkpoint files.

You need to set up your input pipeline again, manually load one of those files, then start looping through your batches storing the predictions and the labels:

# Rebuild the input pipeline
input_fn = create_eval_input_fn(path=eval_files)
features, labels = input_fn()

# Rebuild the model
predictions = model_fn(features, labels, tf.estimator.ModeKeys.EVAL).predictions

# Manually load the latest checkpoint
saver = tf.train.Saver()
with tf.Session() as sess:
    ckpt = tf.train.get_checkpoint_state('/my/directory')
    saver.restore(sess, ckpt.model_checkpoint_path)

    # Loop through the batches and store predictions and labels
    prediction_values = []
    label_values = []
    while True:
        try:
            preds, lbls = sess.run([predictions, labels])
            prediction_values += preds
            label_values += lbls
        except tf.errors.OutOfRangeError:
            break
    # store prediction_values and label_values somewhere

Update: changed to use directly the model_fn function you already have.

GPhilo
  • 18,519
  • 9
  • 63
  • 89
  • Thanks for your response. I see what you want to do here, I don't know why I couldn't get this done by myself. One more thing though, what is your `model` function here ? Assuming it's the function that build the graph, it corresponds to my `model_fn` defined above. However it returns a `tf.EstimatorSpec` and not the actual _predictions_ variable. Any idea how to get _predictions_ from the `tf.EstimatorSpec` ? – Benjamin Larrousse Nov 30 '17 at 15:46
  • 1
    See my updated code, you can get the predictions simply via the `predictions` attribute of the `EstimatorSpec` – GPhilo Nov 30 '17 at 15:54
  • Ok, thanks a lot, I didn't know we cloud do `.predictions` like this. It works perfectly now. – Benjamin Larrousse Nov 30 '17 at 17:54
  • @GPhilo I tried your solution, I got an error due to `saver``ValueError("No variables to save")`. I fixed that by `saver =tf.train.import_meta_graph(r'./model.ckpt-5980.meta')`. Then I got `FailedPreconditionError: Attempting to use uninitialized value batch_normalization_14/gamma`. I found that the fix for this was `sess.run(tf.global_variables_initializer())`, but then all the predictions are wrong compared to predictions from `classifier.evaluate` on the same dataset. – Effective_cellist Feb 08 '18 at 20:15
  • That sounds like you're having another problem somewhere. The "no variables to save" happens because you define the saver before your graph. Note how I define it __after__ calling model_fn. That is important. You should only make the saver once all the graph is defined, because it will then pick up the variables to save from the global variables collection. – GPhilo Feb 08 '18 at 20:40
  • About why classification is wrong, well, you're initializing all the variables. That means, wiping away the trained weights and reinitialising them to random values. Of course this will cause predictions to be wrong – GPhilo Feb 08 '18 at 20:41
  • @GPhilo could you please take a look at [this](https://stackoverflow.com/questions/48679622/restoring-a-model-trained-with-tf-estimator-and-performing-inference-on-it) This is the [code](https://github.com/deepaksuresh/models/blob/master/official/resnet/resnet.py) I referred to build my model – Effective_cellist Feb 09 '18 at 05:23
  • @GPhilo I just fixed the `"No variables to save"` and got rid of the `global_variables_initializer`. Now my restore/predict code is the same as in your answer. The model predicts one class for all images. I ran the same dataset on `classifier.evaluate` from my training code, and predictions are accurate and consistent with each run. Do you know why this could happen? – Effective_cellist Feb 09 '18 at 05:29
  • Open a new question, put all the needed information and your code in it, then link it here and I'll have a look. – GPhilo Feb 09 '18 at 07:45
  • @GPhilo I've opened a question [here](https://stackoverflow.com/questions/48679622/restoring-a-model-trained-with-tf-estimator-and-feeding-input-through-feed-dict). Please take a look. Thanks for your time. – Effective_cellist Feb 09 '18 at 09:58