In my CNN training using TensorFlow, I am using Keras.losses.poisson
as a loss function. Now, I like to calculate many metrics alongside that loss function, and I am observing that Keras.metrics.poisson
gives different results - although the two are the same function.
See here for some example output: loss
and poisson
outputs have different ranges, 0.5 vs. 0.12:
Epoch 1/20
Epoch 00001: val_loss improved from inf to 0.53228, saving model to P:\Data\xyz.h5
- 8174s - loss: 0.5085 - binary_crossentropy: 0.1252 - poisson: 0.1271 - mean_squared_error: 1.2530e-04 - mean_absolute_error: 0.0035 - mean_absolute_percentage_error: 38671.1055 - val_loss: 0.5323 - val_binary_crossentropy: 0.1305 - val_poisson: 0.1331 - val_mean_squared_error: 5.8477e-05 - val_mean_absolute_error: 0.0035 - val_mean_absolute_percentage_error: 1617.8346
Epoch 2/20
Epoch 00002: val_loss improved from 0.53228 to 0.53218, saving model to P:\Data\xyz.h5
- 8042s - loss: 0.5067 - binary_crossentropy: 0.1246 - poisson: 0.1267 - mean_squared_error: 1.0892e-05 - mean_absolute_error: 0.0017 - mean_absolute_percentage_error: 410.8044 - val_loss: 0.5322 - val_binary_crossentropy: 0.1304 - val_poisson: 0.1330 - val_mean_squared_error: 4.9087e-05 - val_mean_absolute_error: 0.0035 - val_mean_absolute_percentage_error: 545.5222
Epoch 3/20
Epoch 00003: val_loss improved from 0.53218 to 0.53199, saving model to P:\Data\xyz.h5
- 8038s - loss: 0.5066 - binary_crossentropy: 0.1246 - poisson: 0.1266 - mean_squared_error: 6.6870e-06 - mean_absolute_error: 0.0013 - mean_absolute_percentage_error: 298.9844 - val_loss: 0.5320 - val_binary_crossentropy: 0.1304 - val_poisson: 0.1330 - val_mean_squared_error: 4.3858e-05 - val_mean_absolute_error: 0.0031 - val_mean_absolute_percentage_error: 452.3541
I have found a similar questions while typing this one: Keras - Loss and Metric calculated differently? However, I am not using regularization.
In addition, I have come across this one, which at least helped me reproduce the issue: Same function in Keras Loss and Metric give different values even without regularization
from tensorflow import keras
layer = keras.layers.Input(shape=(1, 1, 1))
model = keras.models.Model(inputs=layer, outputs=layer)
model.compile(optimizer='adam', loss='poisson', metrics=['poisson'])
data = [[[[[1]]], [[[2]]], [[[3]]]]]
model.fit(x=data, y=data, batch_size=2, verbose=1)
What I have found then is that, basically, it's the dimensionality that triggers this issue. From the following extended example, you can see that
- the issue can be reproduced with many loss functions (the ones hat don't begin with
mean_
), - the issue goes away when replacing
tensorflow.keras
withkeras
, and tensorflow.keras
seems to scale the metrics by the batch size if the dimensionality of the data is larger than three. At least that is my humble interpretation.
The code:
import numpy as np
from tensorflow import keras
# import keras
nSamples = 98765
nBatch = 2345
metric = 'poisson'
# metric = 'squared_hinge'
# metric = 'logcosh'
# metric = 'cosine_proximity'
# metric = 'binary_crossentropy'
# example data: always the same samples
np.random.seed(0)
dataIn = np.random.rand(nSamples)
dataOut = np.random.rand(nSamples)
for dataDim in range(1, 10):
# reshape samples into size (1,), ..., (1, 1, ...) according to dataDim
dataIn = np.expand_dims(dataIn, axis=-1)
dataOut = np.expand_dims(dataOut, axis=-1)
# build a model that does absolutely nothing
Layer = keras.layers.Input(shape=np.ones(dataDim))
model = keras.models.Model(inputs=Layer, outputs=Layer)
# compile, fit and observe loss ratio
model.compile(optimizer='adam', loss=metric, metrics=[metric])
history = model.fit(x=dataIn, y=dataOut, batch_size=nBatch, verbose=1)
lossRatio = history.history['loss'][0] / history.history[metric][0]
print(lossRatio)
I find this behavior inconsistent at least. Should I consider it a bug or a feature?
Update: After further investigation, I have found out that the metrics values seen to be computed correctly, while the loss values are not; in fact, the losses are weighted sums of the sample losses, where the weighting of each sample is the size of the batch that sample is in. This has two implications:
- If the batch size divides the number of samples, the weighing of all samples is identical and the losses are simply off by that factor equal to the batch size.
- If the batch size does not divide the number of sample, since batches are usually shuffled, the weighting, and thus the computed loss changes from one epoch to the next, despite nothing else having changed. This also applies to metrics such as the MSE.
The following code proves these points:
import numpy as np
import tensorflow as tf
from tensorflow import keras
# metric = keras.metrics.poisson
# metricName = 'poisson'
metric = keras.metrics.mse
metricName = 'mean_squared_error'
nSamples = 3
nBatchSize = 2
dataIn = np.random.rand(nSamples, 1, 1, 1)
dataOut = np.random.rand(nSamples, 1, 1, 1)
tf.InteractiveSession()
layer = keras.layers.Input(shape=(1, 1, 1))
model = keras.models.Model(inputs=layer, outputs=layer)
model.compile(optimizer='adam', loss=metric, metrics=[metric])
h = model.fit(x=dataIn, y=dataOut, batch_size=nBatchSize, verbose=1, epochs=10)
for (historyMetric, historyLoss) in zip(h.history[metricName], h.history['loss']):
# the metric value is correct and can be reproduced in a number of ways
kerasMetricOfData = metric(dataOut, dataIn).eval()
averageMetric = np.mean(kerasMetricOfData)
assert np.isclose(historyMetric, averageMetric), "..."
flattenedMetric = metric(dataOut.flatten(), dataIn.flatten()).eval()
assert np.isclose(historyMetric, flattenedMetric), "..."
if metric == keras.metrics.poisson:
numpyMetric = np.mean(dataIn - np.log(dataIn) * dataOut)
assert np.isclose(historyMetric, numpyMetric), "..."
# the loss value is incorrect by at least a scaling factor (~ batch size).
# also varies *randomly* if the batch size does not divide the # of samples:
if nSamples == 3:
incorrectLoss = np.array([
np.mean(kerasMetricOfData.flatten() * [1, nBatchSize, nBatchSize]),
np.mean(kerasMetricOfData.flatten() * [nBatchSize, 1, nBatchSize]),
np.mean(kerasMetricOfData.flatten() * [nBatchSize, nBatchSize, 1]),
])
elif nSamples == 4:
incorrectLoss = np.mean(kerasMetricOfData) * nBatchSize
assert np.any(np.isclose(historyLoss, incorrectLoss)), "..."
It outputs:
Epoch 1/10
2/3 [===================>..........] - ETA: 0s - loss: 0.0044 - mean_squared_error: 0.0022
3/3 [==============================] - 0s 5ms/sample - loss: 0.0099 - mean_squared_error: 0.0084
Epoch 2/10
2/3 [===================>..........] - ETA: 0s - loss: 0.0238 - mean_squared_error: 0.0119
3/3 [==============================] - 0s 2ms/sample - loss: 0.0163 - mean_squared_error: 0.0084
Epoch 3/10
2/3 [===================>..........] - ETA: 0s - loss: 0.0238 - mean_squared_error: 0.0119
3/3 [==============================] - 0s 2ms/sample - loss: 0.0163 - mean_squared_error: 0.0084
Epoch 4/10
2/3 [===================>..........] - ETA: 0s - loss: 0.0238 - mean_squared_error: 0.0119
3/3 [==============================] - 0s 2ms/sample - loss: 0.0163 - mean_squared_error: 0.0084
Epoch 5/10
2/3 [===================>..........] - ETA: 0s - loss: 0.0238 - mean_squared_error: 0.0119
3/3 [==============================] - 0s 2ms/sample - loss: 0.0163 - mean_squared_error: 0.0084
Epoch 6/10
2/3 [===================>..........] - ETA: 0s - loss: 0.0222 - mean_squared_error: 0.0111
3/3 [==============================] - 0s 2ms/sample - loss: 0.0158 - mean_squared_error: 0.0084
Epoch 7/10
2/3 [===================>..........] - ETA: 0s - loss: 0.0222 - mean_squared_error: 0.0111
3/3 [==============================] - 0s 2ms/sample - loss: 0.0158 - mean_squared_error: 0.0084
Epoch 8/10
2/3 [===================>..........] - ETA: 0s - loss: 0.0238 - mean_squared_error: 0.0119
3/3 [==============================] - 0s 2ms/sample - loss: 0.0163 - mean_squared_error: 0.0084
Epoch 9/10
2/3 [===================>..........] - ETA: 0s - loss: 0.0222 - mean_squared_error: 0.0111
3/3 [==============================] - 0s 2ms/sample - loss: 0.0158 - mean_squared_error: 0.0084
Epoch 10/10
2/3 [===================>..........] - ETA: 0s - loss: 0.0044 - mean_squared_error: 0.0022
3/3 [==============================] - 0s 2ms/sample - loss: 0.0099 - mean_squared_error: 0.0084
Update: Finally, there seems to be a difference between using keras.metrics.mse
and 'mse'
, as this example shows:
import numpy as np
from tensorflow import keras
# these three reproduce the issue:
# metric = keras.metrics.poisson
# metric = 'poisson'
# metric = keras.metrics.mse
# this one does not:
metric = 'mse'
nSamples = 3
nBatchSize = 2
dataIn = np.random.rand(nSamples, 1, 1, 1)
dataOut = np.random.rand(nSamples, 1, 1, 1)
layer = keras.layers.Input(shape=(1, 1, 1))
model = keras.models.Model(inputs=layer, outputs=layer)
model.compile(optimizer='adam', loss=metric, metrics=[metric])
model.fit(x=dataIn, y=dataOut, batch_size=2, verbose=1, epochs=10)
I begin to believe that this must be a bug and reported it here.