0

I am following this keras tutorial to create an autoencoder using the MNIST dataset. Here is the tutorial: https://blog.keras.io/building-autoencoders-in-keras.html.

However, I am confused with the choice of activation and loss for the simple one-layer autoencoder (which is the first example in the link). Is there a specific reason sigmoid activation was used for the decoder part as opposed to something such as relu? I am trying to understand whether this is a choice I can play around with, or if it should indeed be sigmoid, and if so why? Similarily, I understand the loss is taken by comparing each of the original and predicted digits on a pixel-by-pixel level, but I am unsure why the loss is binary_crossentropy as opposed to something like mean squared error.

I would love clarification on this to help me move forward! Thank you!

Jane Sully
  • 3,137
  • 10
  • 48
  • 87
  • [This](https://stackabuse.com/autoencoders-for-image-reconstruction-in-python-and-keras/) is an example of using the mse loss with an image auto encoder. For the rest, see [this](https://stackoverflow.com/questions/52441877/how-does-binary-cross-entropy-loss-work-on-autoencoders) complete answer. – Dany Yatim Jan 04 '20 at 12:12

1 Answers1

1

MNIST images are generally normalized in the range [0, 1], so the autoencoder should output images in the same range, for easier learning. This is why a sigmoid activation is used at the output.

The mean squared error loss has a non-linear penalty, with big errors having a larger penalty than smaller errors, which generally leads to converging to the mean of the solution, instead of a more accurace solution. The binary cross-entropy does not have this problem, and thus it is preferred. It works because the output of the model and the labels are in the [0, 1] range, and the loss is applied to all pixels.

Dr. Snoopy
  • 55,122
  • 7
  • 121
  • 140