Why am I getting different values between loss functions and metrics in TensorFlow Keras?

Question

In my CNN training using TensorFlow, I am using Keras.losses.poisson as a loss function. Now, I like to calculate many metrics alongside that loss function, and I am observing that Keras.metrics.poisson gives different results - although the two are the same function.

See here for some example output: loss and poisson outputs have different ranges, 0.5 vs. 0.12:

Epoch 1/20
Epoch 00001: val_loss improved from inf to 0.53228, saving model to P:\Data\xyz.h5
 - 8174s - loss: 0.5085 - binary_crossentropy: 0.1252 - poisson: 0.1271 - mean_squared_error: 1.2530e-04 - mean_absolute_error: 0.0035 - mean_absolute_percentage_error: 38671.1055 - val_loss: 0.5323 - val_binary_crossentropy: 0.1305 - val_poisson: 0.1331 - val_mean_squared_error: 5.8477e-05 - val_mean_absolute_error: 0.0035 - val_mean_absolute_percentage_error: 1617.8346

Epoch 2/20
Epoch 00002: val_loss improved from 0.53228 to 0.53218, saving model to P:\Data\xyz.h5
 - 8042s - loss: 0.5067 - binary_crossentropy: 0.1246 - poisson: 0.1267 - mean_squared_error: 1.0892e-05 - mean_absolute_error: 0.0017 - mean_absolute_percentage_error: 410.8044 - val_loss: 0.5322 - val_binary_crossentropy: 0.1304 - val_poisson: 0.1330 - val_mean_squared_error: 4.9087e-05 - val_mean_absolute_error: 0.0035 - val_mean_absolute_percentage_error: 545.5222

Epoch 3/20
Epoch 00003: val_loss improved from 0.53218 to 0.53199, saving model to P:\Data\xyz.h5
 - 8038s - loss: 0.5066 - binary_crossentropy: 0.1246 - poisson: 0.1266 - mean_squared_error: 6.6870e-06 - mean_absolute_error: 0.0013 - mean_absolute_percentage_error: 298.9844 - val_loss: 0.5320 - val_binary_crossentropy: 0.1304 - val_poisson: 0.1330 - val_mean_squared_error: 4.3858e-05 - val_mean_absolute_error: 0.0031 - val_mean_absolute_percentage_error: 452.3541

I have found a similar questions while typing this one: Keras - Loss and Metric calculated differently? However, I am not using regularization.

In addition, I have come across this one, which at least helped me reproduce the issue: Same function in Keras Loss and Metric give different values even without regularization

from tensorflow import keras

layer = keras.layers.Input(shape=(1, 1, 1))
model = keras.models.Model(inputs=layer, outputs=layer)
model.compile(optimizer='adam', loss='poisson', metrics=['poisson'])
data = [[[[[1]]], [[[2]]], [[[3]]]]]
model.fit(x=data, y=data, batch_size=2, verbose=1)

What I have found then is that, basically, it's the dimensionality that triggers this issue. From the following extended example, you can see that

the issue can be reproduced with many loss functions (the ones hat don't begin with mean_),
the issue goes away when replacing tensorflow.keras with keras, and
tensorflow.keras seems to scale the metrics by the batch size if the dimensionality of the data is larger than three. At least that is my humble interpretation.

The code:

import numpy as np
from tensorflow import keras
# import keras

nSamples = 98765
nBatch = 2345

metric = 'poisson'
# metric = 'squared_hinge'
# metric = 'logcosh'
# metric = 'cosine_proximity'
# metric = 'binary_crossentropy'

# example data: always the same samples
np.random.seed(0)
dataIn = np.random.rand(nSamples)
dataOut = np.random.rand(nSamples)

for dataDim in range(1, 10):
    # reshape samples into size (1,), ..., (1, 1, ...) according to dataDim
    dataIn = np.expand_dims(dataIn, axis=-1)
    dataOut = np.expand_dims(dataOut, axis=-1)

    # build a model that does absolutely nothing
    Layer = keras.layers.Input(shape=np.ones(dataDim))
    model = keras.models.Model(inputs=Layer, outputs=Layer)

    # compile, fit and observe loss ratio
    model.compile(optimizer='adam', loss=metric, metrics=[metric])
    history = model.fit(x=dataIn, y=dataOut, batch_size=nBatch, verbose=1)
    lossRatio = history.history['loss'][0] / history.history[metric][0]
    print(lossRatio)

I find this behavior inconsistent at least. Should I consider it a bug or a feature?

Update: After further investigation, I have found out that the metrics values seen to be computed correctly, while the loss values are not; in fact, the losses are weighted sums of the sample losses, where the weighting of each sample is the size of the batch that sample is in. This has two implications:

If the batch size divides the number of samples, the weighing of all samples is identical and the losses are simply off by that factor equal to the batch size.
If the batch size does not divide the number of sample, since batches are usually shuffled, the weighting, and thus the computed loss changes from one epoch to the next, despite nothing else having changed. This also applies to metrics such as the MSE.

The following code proves these points:

import numpy as np
import tensorflow as tf
from tensorflow import keras

# metric = keras.metrics.poisson
# metricName = 'poisson'
metric = keras.metrics.mse
metricName = 'mean_squared_error'

nSamples = 3
nBatchSize = 2

dataIn = np.random.rand(nSamples, 1, 1, 1)
dataOut = np.random.rand(nSamples, 1, 1, 1)

tf.InteractiveSession()
layer = keras.layers.Input(shape=(1, 1, 1))
model = keras.models.Model(inputs=layer, outputs=layer)
model.compile(optimizer='adam', loss=metric, metrics=[metric])

h = model.fit(x=dataIn, y=dataOut, batch_size=nBatchSize, verbose=1, epochs=10)

for (historyMetric, historyLoss) in zip(h.history[metricName], h.history['loss']):

    # the metric value is correct and can be reproduced in a number of ways

    kerasMetricOfData = metric(dataOut, dataIn).eval()
    averageMetric = np.mean(kerasMetricOfData)
    assert np.isclose(historyMetric, averageMetric), "..."

    flattenedMetric = metric(dataOut.flatten(), dataIn.flatten()).eval()
    assert np.isclose(historyMetric, flattenedMetric), "..."

    if metric == keras.metrics.poisson:
        numpyMetric = np.mean(dataIn - np.log(dataIn) * dataOut)
        assert np.isclose(historyMetric, numpyMetric), "..."

    # the loss value is incorrect by at least a scaling factor (~ batch size).
    # also varies *randomly* if the batch size does not divide the # of samples:

    if nSamples == 3:
        incorrectLoss = np.array([
            np.mean(kerasMetricOfData.flatten() * [1, nBatchSize, nBatchSize]),
            np.mean(kerasMetricOfData.flatten() * [nBatchSize, 1, nBatchSize]),
            np.mean(kerasMetricOfData.flatten() * [nBatchSize, nBatchSize, 1]),
        ])
    elif nSamples == 4:
        incorrectLoss = np.mean(kerasMetricOfData) * nBatchSize
    assert np.any(np.isclose(historyLoss, incorrectLoss)), "..."

It outputs:

Epoch 1/10

2/3 [===================>..........] - ETA: 0s - loss: 0.0044 - mean_squared_error: 0.0022
3/3 [==============================] - 0s 5ms/sample - loss: 0.0099 - mean_squared_error: 0.0084
Epoch 2/10

2/3 [===================>..........] - ETA: 0s - loss: 0.0238 - mean_squared_error: 0.0119
3/3 [==============================] - 0s 2ms/sample - loss: 0.0163 - mean_squared_error: 0.0084
Epoch 3/10

2/3 [===================>..........] - ETA: 0s - loss: 0.0238 - mean_squared_error: 0.0119
3/3 [==============================] - 0s 2ms/sample - loss: 0.0163 - mean_squared_error: 0.0084
Epoch 4/10

2/3 [===================>..........] - ETA: 0s - loss: 0.0238 - mean_squared_error: 0.0119
3/3 [==============================] - 0s 2ms/sample - loss: 0.0163 - mean_squared_error: 0.0084
Epoch 5/10

2/3 [===================>..........] - ETA: 0s - loss: 0.0238 - mean_squared_error: 0.0119
3/3 [==============================] - 0s 2ms/sample - loss: 0.0163 - mean_squared_error: 0.0084
Epoch 6/10

2/3 [===================>..........] - ETA: 0s - loss: 0.0222 - mean_squared_error: 0.0111
3/3 [==============================] - 0s 2ms/sample - loss: 0.0158 - mean_squared_error: 0.0084
Epoch 7/10

2/3 [===================>..........] - ETA: 0s - loss: 0.0222 - mean_squared_error: 0.0111
3/3 [==============================] - 0s 2ms/sample - loss: 0.0158 - mean_squared_error: 0.0084
Epoch 8/10

2/3 [===================>..........] - ETA: 0s - loss: 0.0238 - mean_squared_error: 0.0119
3/3 [==============================] - 0s 2ms/sample - loss: 0.0163 - mean_squared_error: 0.0084
Epoch 9/10

2/3 [===================>..........] - ETA: 0s - loss: 0.0222 - mean_squared_error: 0.0111
3/3 [==============================] - 0s 2ms/sample - loss: 0.0158 - mean_squared_error: 0.0084
Epoch 10/10

2/3 [===================>..........] - ETA: 0s - loss: 0.0044 - mean_squared_error: 0.0022
3/3 [==============================] - 0s 2ms/sample - loss: 0.0099 - mean_squared_error: 0.0084

Update: Finally, there seems to be a difference between using keras.metrics.mse and 'mse', as this example shows:

import numpy as np
from tensorflow import keras

# these three reproduce the issue:
# metric = keras.metrics.poisson
# metric = 'poisson'
# metric = keras.metrics.mse

# this one does not:
metric = 'mse'

nSamples = 3
nBatchSize = 2

dataIn = np.random.rand(nSamples, 1, 1, 1)
dataOut = np.random.rand(nSamples, 1, 1, 1)

layer = keras.layers.Input(shape=(1, 1, 1))
model = keras.models.Model(inputs=layer, outputs=layer)
model.compile(optimizer='adam', loss=metric, metrics=[metric])
model.fit(x=dataIn, y=dataOut, batch_size=2, verbose=1, epochs=10)

I begin to believe that this must be a bug and reported it here.

Did you only experience this behavior with poisson or also with some other metrics? — sebrockm, Feb 21 '19 at 08:59
@sebrockm with the latest code, all metrics (that I tested). — bers, Feb 21 '19 at 14:27

score 2 · Accepted Answer · answered May 25 '19 at 09:05

2

This has been confirmed as a bug and fixed. For more information, see https://github.com/tensorflow/tensorflow/issues/25970.

answered May 25 '19 at 09:05

bers

4,817
2
40
59

Why am I getting different values between loss functions and metrics in TensorFlow Keras?

1 Answers1

Linked