Mlflow: Log steps in evaluation phase using Tensorflow train_and_evaluate

Question

I'm trying to log the steps during the evaluation using Mlflow but have only been able to log the last step. Using mlflow.tensorflow.autolog() I am able to log some metrics (like loss) when a checkpoint is saved, every 100 steps that is defined in RunConfig. However I also need to save the accuracy and top3error every 100 steps the model is evaluated. Here is my code:

def top3error(features, labels, predictions):
    return {'top3error': tf.metrics.mean(tf.nn.in_top_k(predictions=predictions['logits'], 
                                                        targets=labels,
                                                        k=3))}
# Log metrics
mlflow.tensorflow.autolog()

with mlflow.start_run():
    steps = 1000

    mlflow.log_param("Steps", steps)    

    '''Training & Validation'''
    train_spec = tf.estimator.TrainSpec(input_fn=generate_input_fn(train), 
                                        max_steps=steps)
    eval_spec = tf.estimator.EvalSpec(name='validation',
                                      input_fn=generate_input_fn(test, num_epochs=1))

    tf.logging.info("Starting Run...")
    results = tf.estimator.train_and_evaluate(m, train_spec, eval_spec)    

    '''Log Run'''
    mlflow.log_metric("accuracy", results[0]['accuracy'])
    mlflow.log_metric("top3error", results[0]['top3error'])

Here is the RunConfig used in the model:

config=tf.estimator.RunConfig(
  model_dir=model_dir, 
  save_checkpoints_steps=100,
)

Thanks in advance

@zishanahmed through MLflow it's not possible. The only way it if you have a training loop. MLFlow suggest that in a new version it would be able to save all the metrics from tensor board. — Daniel Zapata, Oct 29 '19 at 15:45

score 1 · Answer 1 · answered Aug 05 '19 at 19:54

1

You can achieve this by specifying the metrics you want to log in your Estimator. Unless you're using some sort of training loop and iterating over step, you wouldn't be able to do this directly.

See https://stackoverflow.com/a/45716062

answered Aug 05 '19 at 19:54

Apurva Koti

49
8

Mlflow: Log steps in evaluation phase using Tensorflow train_and_evaluate

1 Answers1