Keras: different validation AUROC during training and on epoch end

Question

I'm getting different AUROC depending on when I calculate it. My code is

 def auc_roc(y_true, y_pred):
     # any tensorflow metric
     value, update_op = tf.metrics.auc(y_true, y_pred)
     return update_op

 model.compile(loss='binary_crossentropy', optimizer=optim, metrics=['accuracy', auc_roc])

 my_callbacks = [roc_callback(training_data=(x_train, y_train),validation_data=(x_test,y_test))]

 model.fit(x_train, y_train, validation_data=(x_test, y_test), callbacks=my_callbacks)

Where roc_callback is a Keras callback that calculates the AUROC at the end of each epoch using roc_auc_score from sklearn. I use the code that is defined here.

When I train the model, I get the following statistics:

  Train on 38470 samples, validate on 9618 samples
  Epoch 1/15
  38470/38470 [==============================] - auc_roc: 0.5116 - val_loss: 0.6899 - val_acc: 0.6274 - val_auc_roc: 0.5440

  roc-auc_val: 0.5973                                                                                                    

  Epoch 2/15
  38470/38470 [==============================] - auc_roc: 0.5777 - val_loss: 0.6284 - val_acc: 0.6870 - val_auc_roc: 0.6027

  roc-auc_val: 0.6391 

  .
  .
  .
  .
  .
  .
  .


  Epoch 12/15
  38470/38470 [==============================] - auc_roc: 0.8754 - val_loss: 0.9569 - val_acc: 0.7747 - val_auc_roc: 0.8779

  roc-auc_val: 0.6369

So how is the AUROC calculated during training going up with each epoch? Why is it different from the one calculated at the epoch end?

Daniel Möller · Answer 1 · 2018-07-11T18:34:37.903

2

During training, the metrics are calculated "per batch". And they keep updating for each new batch in some sort of "mean" between the current batch metrics and the previous results.

Now, your callback calculates on the "entire data", and only at the end. There will be normal differences between the two methods.

It's very common to see the next epoch start with a metric way better than the value shown for the last epoch, because the old metric includes in its mean value a lot of batches that weren't trained at that time.

You can perform a more precise comparison by calling model.evaluate(x_test,y_test). Not sure if there will be conflicts by calling this "during" training, but you could train each epoch individually and call this between each epoch.

Something strange:

There isn't any y_pred in your roc_callback. Are you calling a model.predict() inside it?

edited Jul 11 '18 at 18:34

answered Jul 11 '18 at 18:23

Daniel Möller

84,878
18
192
214

I understand there'll be differences but in the later epochs, it's huge. For eg in what I posted above, epoch 12 has `val_auc_roc`=0.87 and at the end of the epoch the callback calculates `roc-auc_val`=0.6369. – HMK Jul 11 '18 at 18:30
I've linked to it in my question but [here you go](https://stackoverflow.com/questions/41032551/how-to-compute-receiving-operating-characteristic-roc-and-auc-in-keras/46844409#46844409). Yes it does call `model.predict` inside it. – HMK Jul 11 '18 at 18:37
What if you try to make that callback print between batches? For testing, you could set `shuffle=False` in your fit method and compare batch by batch to see. I suspect that becase "auc" is based on sorting, it might have big variations when calculated by batch. – Daniel Möller Jul 11 '18 at 18:42
I'll try it out. But I suspect it's more what you explained earlier: that it's per-batch statistics that is averaged out. But it is pretty unusual for the average batch AUROC to be this far off from the population AUROC – HMK Jul 11 '18 at 21:48

jonathanking · Answer 2 · 2018-07-11T18:25:57.177

The auc_roc value printed to the right of the progress bar is the metric you provided to model.compile(). This score is computed by evaluating your defined auc_roc function on the training data one batch at a time. As the model continues training, this value is updated as a running average of the model's performance. Similarly, val_auc_roc is computed by evaluating your auc_roc function on the validation data.

roc-auc_val on the other hand is completely defined by the callback that you are using, roc-auc_val. Look into the code that you linked to more closely. It is determining an AUC score for your model using sklearn's function, rather than tensorflow's function. Whatever differences that appear between the auc_roc and roc-auc_val values printed can be explained by the differences between the two AUC functions that are being used.

I did try it with TF's AUC function at the epoch end and I get similar to sklearn's value than the one seen during the training — HMK, Jul 11 '18 at 18:25

Keras: different validation AUROC during training and on epoch end

2 Answers2