Keras convolutional autoencoder, MSE works on fashion_mnist but it doesn't on mnist

Question

I'm following this tutorial https://blog.keras.io/building-autoencoders-in-keras.html specifically the convolutional example. I don't understand why if I change the loss function from binary_crossentropy to MSE it only works on fashion_mnist.

Using mnist, the loss drops after the first epoch and no longer varies. After the training, the predicted images on the test set are just black images. Using fashion_mnist it works perfectly.

import keras
from keras import layers
import keras.backend as K

input_img = keras.Input(shape=(28, 28, 1))

x = layers.Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.MaxPooling2D((2, 2), padding='same')(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = layers.MaxPooling2D((2, 2), padding='same')(x)

# at this point the representation is (4, 4, 8) i.e. 128-dimensional

x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = layers.UpSampling2D((2, 2))(x)
x = layers.Conv2D(16, (3, 3), activation='relu')(x)
x = layers.UpSampling2D((2, 2))(x)
decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = keras.Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mse') # binary_crossentropy

from keras.datasets import mnist
from keras.datasets import fashion_mnist
import numpy as np

(x_train, _), (x_test, _) = mnist.load_data() # fashion_mnist.load_data()

x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

history = autoencoder.fit(x_train, x_train,
                epochs=50,
                batch_size=128,
                shuffle=True,
                validation_data=(x_test, x_test))

`MSE` doesn't punish misclassifications enough but is the right loss for *regression*. For *classification* `cross-entropy` tends to be more suitable than `MSE`. Coming to your issue, i am not seeing any black images when i tried to predict. Can you try and share your attempt in Google Colab, so that we can try to help you the best. Thanks! — , Mar 15 '21 at 13:16

score 0 · Answer 1 · answered Mar 19 '21 at 11:37

I guess you want to perform some Image denoising/reconstruction using an Autoencoder. For this kind of Task, MSE is the right Loss to use. For this task, you have to use a linear activation function at the output layer, so you can compare your reconstructed output Image array elements(pixels) with the pixels of the Label Image. Pixels are usually not normalized and have commonly values between 0 and 255. Sigmoid activation function will normalize the values of your output to values between 0 and 1, which is best suited for classification Tasks, because it gives you the class probability of 2 classes (softmax for more classes). This activation function is used then together with Cross-Entropy function for classification task.

score -1 · Answer 2 · edited Dec 31 '21 at 13:47

I got similar issues.

This is my architecture (same as yours + data preprocessing is the same but I am not submitting it here), yellow one are the one I was playing with:

But I got loss and validation loss like this:

And reconstructed result all black images as you also described.

Solutions for me were:

reduce learning rate if you use Adam optimizer (default is 0.001, I changed to 0.00005), otherwise my loss function got stuck at the same value as mentioned here green chart.

And this is the result after reducing learning rate

Change optimizer - SGD helped as well (even with default values):

As suggested above, changing last activation function from sigmoid to linear on 0-1 scaled data also worked for me (as I understood from the code, you also have all in 0-1 interval scaled)

Please never use screenshots for code. Always paste it as text with proper formatting. — Ruli, Dec 30 '21 at 19:15

Keras convolutional autoencoder, MSE works on fashion_mnist but it doesn't on mnist

2 Answers2