Fine tuning deep autoencoder model for mnist

Question

I have developed a 3 layer deep autoencoder model for the mnist dataset as I am just practicing on this toy dataset as I am beginner in this fine-tuning paradigm

Following is the code

from keras import  layers
from keras.layers import Input, Dense
from keras.models import Model,Sequential
from keras.datasets import mnist
import numpy as np

# Deep Autoencoder


# this is the size of our encoded representations
encoding_dim = 32   # 32 floats -> compression factor 24.5, assuming the input is 784 floats

# this is our input placeholder; 784 = 28 x 28
input_img = Input(shape=(784, ))

my_epochs = 100

# "encoded" is the encoded representation of the inputs
encoded = Dense(encoding_dim * 4, activation='relu')(input_img)
encoded = Dense(encoding_dim * 2, activation='relu')(encoded)
encoded = Dense(encoding_dim, activation='relu')(encoded)

# "decoded" is the lossy reconstruction of the input
decoded = Dense(encoding_dim * 2, activation='relu')(encoded)
decoded = Dense(encoding_dim * 4, activation='relu')(decoded)
decoded = Dense(784, activation='sigmoid')(decoded)

# this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)

# Separate Encoder model

# this model maps an input to its encoded representation
encoder = Model(input_img, encoded)

# Separate Decoder model

# create a placeholder for an encoded (32-dimensional) input
encoded_input = Input(shape=(encoding_dim, ))
# retrieve the layers of the autoencoder model
decoder_layer1 = autoencoder.layers[-3]
decoder_layer2 = autoencoder.layers[-2]
decoder_layer3 = autoencoder.layers[-1]
# create the decoder model
decoder = Model(encoded_input, decoder_layer3(decoder_layer2(decoder_layer1(encoded_input))))

# Train to reconstruct MNIST digits

# configure model to use a per-pixel binary crossentropy loss, and the Adadelta optimizer
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

# prepare input data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# normalize all values between 0 and 1 and flatten the 28x28 images into vectors of size 784
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
x_train = x_train.reshape((len(x_train), np.prod(x_train.shape[1:])))
x_test = x_test.reshape((len(x_test), np.prod(x_test.shape[1:])))

# Train autoencoder for 50 epochs

autoencoder.fit(x_train, x_train, epochs=my_epochs, batch_size=256, shuffle=True, validation_data=(x_test, x_test),
                verbose=2)

# after 100 epochs the autoencoder seems to reach a stable train/test lost value

# Visualize the reconstructed encoded representations

# encode and decode some digits
# note that we take them from the *test* set
encodedTrainImages=encoder.predict(x_train)
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)





# From here I want to fine tune just the encoder model
model=Sequential()
model=Sequential()
for layer in encoder.layers:
  model.add(layer)
model.add(layers.Flatten())
model.add(layers.Dense(20, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation='softmax'))

Following is my encoder model which I want to fine-tune.

encoder.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 784)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
=================================================================
Total params: 110,816
Trainable params: 110,816
Non-trainable params: 0
_________________________________________________________________

Problem:1

After building the autoencoder model I want to just use the encoder model and fine tune it for classification task in mnist dataset but I am getting errors.

Error:

Traceback (most recent call last):
  File "C:\Users\samer\Anaconda3\envs\tensorflow-gpu\lib\site-packages\IPython\core\interactiveshell.py", line 3267, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-15-528c079e5325>", line 3, in <module>
    model.add(layers.Flatten())
  File "C:\Users\samer\Anaconda3\envs\tensorflow-gpu\lib\site-packages\keras\engine\sequential.py", line 181, in add
    output_tensor = layer(self.outputs[0])
  File "C:\Users\samer\Anaconda3\envs\tensorflow-gpu\lib\site-packages\keras\engine\base_layer.py", line 414, in __call__
    self.assert_input_compatibility(inputs)
  File "C:\Users\samer\Anaconda3\envs\tensorflow-gpu\lib\site-packages\keras\engine\base_layer.py", line 327, in assert_input_compatibility
    str(K.ndim(x)))
ValueError: Input 0 is incompatible with layer flatten_4: expected min_ndim=3, found ndim=2

Problem 2:

Similarly I would later use pre-trained model where each autoencoder would be trained in a greedy manner and then the final model would be fine tuned. Can somebody just guide me how to proceed further in my these two tasks.

regards

I am not sure what you mean by **fine_tune** it seems to me like you are trying to use the encoder "as-is" and adding layers to it. which is **transfer-learning**. — Benjamin Breton, May 14 '19 at 13:14
You are right it is pretty much kind of that but I believe transfer learning is when you transfer one model from one domain to other where either you don't have lot of training data or you think that the learned model has learned some features that you think would be useful in your new domain. In my case I am in same domain with same dataset and just practicing how these things work in code? — Naseer, May 14 '19 at 13:22
Could you explain more in details what you want to achieve in Problem 2? Do you intend to have several auto-encoders stacked one after the other one or do you want to have a "parallel" structure where each auto-encoder is specialized in one task and finally perform some kind of voting / get concatenated? or something else? — DLM, May 19 '19 at 08:53
Can you show us the code you are running to get the error in Problem 1? I think the second question is too broad, you should try narrowing it down and clarifying it. — Zaccharie Ramzi, May 19 '19 at 13:29

DLM · Accepted Answer · 2019-05-22T14:09:15.823

Problem 1

The problem is that you are trying to flatten a layer that is already flat: you encoder is made up of one-dimensional Desnse layers, which have shape (batch_size, dim).

The Flatten layer is expecting at least a 2D input, i.e. having a 3 dimensional shape (batch_size, dim1, dim2) (e.g. the output of a Conv2D layer), by removing it the model will build properly:

encoding_dim = 32
input_img = layers.Input(shape=(784, ))

encoded = layers.Dense(encoding_dim * 4, activation='relu')(input_img)
encoded = layers.Dense(encoding_dim * 2, activation='relu')(encoded)
encoded = layers.Dense(encoding_dim, activation='relu')(encoded)

encoder = Model(input_img, encoded)

[...]

model = Sequential()
for layer in encoder.layers:
    print(layer.name)
    model.add(layer)
model.add(layers.Dense(20, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation='softmax'))

model.summary()

Which ouputs:

input_1
dense_1
dense_2
dense_3
Model: "sequential_1"
________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_2 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_4 (Dense)              (None, 20)                660       
_________________________________________________________________
dropout_1 (Dropout)          (None, 20)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                210       
=================================================================
Total params: 111,686
Trainable params: 111,686
Non-trainable params: 0
_________________________________________________________________

___

Edit: integrating answers to questions in the comments

Q: How can I be sure that the new model will be using the same weights as the previously trained encoder?

A: In your code, what you are doing is iterating through the layers contained inside of the encoder, then passing each of them to model.add(). What you are doing here is passing the reference to each layer directly, therefore you will have the very same layer inside your new model. Here is a proof of concept using the layer name:

encoding_dim = 32

input_img = Input(shape=(784, ))

encoded = Dense(encoding_dim * 4, activation='relu')(input_img)
encoded = Dense(encoding_dim * 2, activation='relu')(encoded)

encoded = Dense(encoding_dim, activation='relu')(encoded)

decoded = Dense(encoding_dim * 2, activation='relu')(encoded)
decoded = Dense(encoding_dim * 4, activation='relu')(decoded)
decoded = Dense(784, activation='sigmoid')(decoded)

autoencoder = Model(input_img, decoded)

print("autoencoder first Dense layer reference:", autoencoder.layers[1])

encoder = Model(input_img, encoded)

print("encoder first Dense layer reference:", encoder.layers[1])

new_model = Sequential()
for i, layer in enumerate(encoder.layers):
  print("Before: ", layer.name)
  new_model.add(layer)
  if i != 0:
    new_model.layers[i-1].name = "new_model_"+layer.name
    print("After: ", layer.name)

Which outputs:

autoencoder first Dense layer reference: <keras.layers.core.Dense object at 
0x7fb5f138e278>
encoder first Dense layer reference: <keras.layers.core.Dense object at 
0x7fb5f138e278>
Before:  input_1
Before:  dense_1
After:  new_model_dense_1
Before:  dense_2
After:  new_model_dense_2
Before:  dense_3
After:  new_model_dense_3

As you can see, the layer references in the encoder and in the autoencoder are the same. Whatsmore, by changing the layer name inside of the new model we are also changing the layer name inside of the encoder's corresponding layer. For more details on python arguments being passed by reference, check out this answer.

Q: should I need one-hot encoding for my data? if so, then how?

A: You do need a one-hot encoding since you are dealing with a multi-label categorical problem. The encoding is simply done by using a handy keras function:

from keras.utils import np_utils

one_hot = np_utils.to_categorical(y_train)

Here's a link to the documentation.

___

Problem 2

Regarding your second question, it is not very clear what you're aiming to, however what seems to me is that you want to build an architecture which contains several parallel auto-encoders which are specialized on different tasks and then concatenate their output by adding some final, common layers.

In any case, so far what I can do is suggesting you to take a look into this guide, which explains how to build multi-input and multi-output models and use it as a baseline to start with your custom implementation.

___

Edit 2: Problem 2 answer integration

Regarding the greedy training task, the approach is to train one layer at a time by freezing all the previous one as you append new ones. Here's an example for a 3(+1) greedy-trained-layers network, which is later used as a base for a new model:

(x_train, y_train), (x_test, y_test) = mnist.load_data()
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
x_train = np.reshape(x_train, (x_train.shape[0], -1))
x_test = np.reshape(x_test, (x_test.shape[0], -1))

model = Sequential()
model.add(Dense(256, activation="relu", kernel_initializer="he_uniform", input_shape=(28*28,)))
model.add(Dense(10, activation="softmax"))

model.compile(optimizer=SGD(lr=0.01, momentum=0.9), loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=64, epochs=50, verbose=1)

# Remove last layer
model.pop()

# 'Freeze' previous layers, so to single-train the new one
for layer in model.layers:
    layer.trainable = False

# Append new layer + classification layer
model.add(Dense(64, activation="relu", kernel_initializer="he_uniform"))
model.add(Dense(10, activation="softmax"))

model.fit(x_train, y_train, batch_size=64, epochs=50, verbose=0)

#  Remove last layer
model.pop()

# 'Freeze' previous layers, so to single-train the new one
for layer in model.layers:
    layer.trainable = False

# Append new layer + classification layer
model.add(Dense(32, activation="relu", kernel_initializer="he_uniform"))
model.add(Dense(10, activation="softmax"))

model.fit(x_train, y_train, batch_size=64, epochs=50, verbose=0)

# Create new model which will use the pre-trained layers
new_model = Sequential()

# Discard the last layer from the previous model
model.pop()

# Optional: you can decide to set the pre-trained layers as trainable, in 
# which case it would be like having initialized their weights, or not.
for l in model.layers:
    l.trainable = True
new_model.add(model)

new_model.add(Dense(20, activation='relu'))
new_model.add(Dropout(0.5))
new_model.add(Dense(10, activation='softmax'))

new_model.compile(optimizer=SGD(lr=0.01, momentum=0.9), loss="categorical_crossentropy", metrics=["accuracy"])
new_model.fit(x_train, y_train, batch_size=64, epochs=100, verbose=1)

This is roughly it, however I must say that greedy layer training may not be a proper solution anymore: nowadays ReLU, Dropout and other regularization techniques which make the greedy layer training an obsolete and time consuming weight initialization, therefore you might want to take a look at other possibilities as well before going for greedy training.

___

You are right in second case I am looking for layer wise training.So please if you have time you can do that I am also making mine. One question relevant to Problem 1 is that how can I be sure that the fine tune encoder is using the trained weights that I just got from autoencoder and it is not again initializing the weights from the start and for softmax layer at the output should I need one-hot encoding for my data if so then how? — Naseer, May 21 '19 at 04:28
You can be sure about that because you are passing the layer to model.add(), which is done by reference, and not a new one, not even a copy. Also, yes you do need a one-hot encoding for your data and it is simply done using the to_categorical keras utility function. I will edit my answer adding a proof of concept for both these issues. Regarding the second problem, I will try to find some time to provide a small example, but you should really clarify what you want do, perhaps you can make an example? — DLM, May 21 '19 at 10:57
Please also complete problem 2 for just 3 hidden layers of sizes 256,64 and 32 and finally fine tune network for classification based on the greedy layers so that answer is complete and I could award this bounty to you. — Naseer, May 22 '19 at 03:43

Fine tuning deep autoencoder model for mnist

1 Answers1

Problem 1

Edit: integrating answers to questions in the comments

Problem 2

Edit 2: Problem 2 answer integration