How does Tensorflow or Keras handle model weight inititialization and when does it happen?

Question

After reading the answer to this question I am a bit confused as to when exactly TensorFlow initializes the weight and bias variables. As per the answers, Compile defines the loss function, the optimizer and the metrics. That's all.

Since the compile() method doesn't initialize it then that would suggest that it happens during the fit() method run.

However the issue with that is, in case of loading models or loading weights how would fit() know that the weights, its presented with, are actually useful and should not be thrown away and then assigned random values in place of those.

We pass the type of intitializer in the argument kernel_initializer while declaring the layer. For example:

dense02 = tf.keras.layers.Dense(units=10, 
                kernel_initializer='glorot_uniform',
                bias_initializer='zeros')

So an obvious question would be whether the weights are initialized layer by layer during the first epoch forward pass or does it happen for all layers before the first epoch.

(What I am trying to say is that if there say 5 Dense layers in the model, then does the initialization happen say a layer at a time, i.e. the first Dense layer gets initialized then the forward pass happens for that layer, then the second layer is initialized and the forward pass for second Dense layer happens and so on)

Another aspect is regarding transfer learning, when stacking custom layers on top of a trained model, the trained model layers have the weights, while the layers that I added wouldn't have any useful layers. So how would TensorFlow know to only initialize the variables of the layers I added and not the mess up the layers of the transferred model (provided, I don't have trainable=False)

How does TensorFlow or Keras handle weight initialization?

score 4 · Accepted Answer · edited Dec 26 '20 at 15:40

The weights are initialized when the model is created (when each layer in model is initialized), i.e before the compile() and fit():

import tensorflow as tf
from tensorflow.keras import models, layers

inputs = layers.Input((3, ))
outputs = layers.Dense(units=10, 
                kernel_initializer='glorot_uniform',
                bias_initializer='zeros')(inputs)

model = models.Model(inputs=inputs, outputs=outputs)

for layer in model.layers: 
    print("Config:\n{}\nWeights:\n{}\n".format(layer.get_config(), layer.get_weights()))

Outputs:

Config:
{'batch_input_shape': (None, 3), 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'input_1'}
Weights:
[]

Config:
{'name': 'dense', 'trainable': True, 'dtype': 'float32', 'units': 10, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}
Weights:
[array([[-0.60352975,  0.08275259, -0.6521113 , -0.5860774 , -0.42276743,
        -0.3142944 , -0.28118378,  0.07770532, -0.5644444 , -0.47069687],
       [ 0.4611913 ,  0.35170448, -0.62191975,  0.5837332 , -0.3390234 ,
        -0.4033073 ,  0.03493106, -0.06078851, -0.53159714,  0.49872506],
       [ 0.43685734,  0.6160207 ,  0.01610583, -0.3673877 , -0.14144647,
        -0.3792309 ,  0.05478126,  0.602067  , -0.47438127,  0.36463356]],
      dtype=float32), array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)]

How does TF handle the transfer learning weights and the weights of the layers stacked on top of it? — mb0850, Dec 27 '20 at 18:55
It's basically the same except the backbone model load weights from file after model initialized or load the entire model from file — Mr. For Example, Dec 28 '20 at 01:00

score 1 · Answer 2 · answered Jan 05 '21 at 06:55

After doing a bit more research, even though Mr. For Example's answer is correct, lets get a bit more deep into how initialization works in TensorFlow Keras.

As per the tf.keras.layers.Layer Doc, we can create variables in the following two methods:

__init__(self, ...): Defines custom layer attributes, and creates layer state variable that do not depend on input shapes, using add_weight()
build(self, input_shape): This method can be used to create weights that depend on the shape(s) of the input(s), using add_weight() kb

The below code shows an example of a basic layer with 2 variables that does the computation: y = w . x + b:

class SimpleDense(Layer):

  def __init__(self, units=32):
      super(SimpleDense, self).__init__()
      self.units = units

  def build(self, input_shape):  # Create the state of the layer (weights)
    w_init = tf.random_normal_initializer()
    self.w = tf.Variable(
        initial_value=w_init(shape=(input_shape[-1], self.units),
                             dtype='float32'),
        trainable=True)
    b_init = tf.zeros_initializer()
    self.b = tf.Variable(
        initial_value=b_init(shape=(self.units,), dtype='float32'),
        trainable=True)

  def call(self, inputs):  # Defines the computation from inputs to outputs
      return tf.matmul(inputs, self.w) + self.b

# Instantiates the layer.
linear_layer = SimpleDense(4)

# This will also call `build(input_shape)` and create the weights.
y = linear_layer(tf.ones((2, 2)))
assert len(linear_layer.weights) == 2

# These weights are trainable, so they're listed in `trainable_weights`:
assert len(linear_layer.trainable_weights) == 2

The most interesting thing to note in the above code is when the build method is called.

The build() is called when the layer (after it has been initialized) is assigned some sort of input whether it be actual values or just a TensorFlow placeholder.

When using a Keras Sequential model, we add a layer to the model, it automatically assigns the input placeholder to the layer and there by initializing it at the same time.

Thus we see the weights before the calling of compile() or the fit() methods of the Keras Model. (Note that __call__() will automatically build the layer (if it has not been built yet) by calling build())

Regarding Transfer Learning, when we are loading the transferred model, we are loading already built layers, so the build method is not called again when you add the layers to your own model.

In other words, the layers, of the transferred model, already have had the input placeholder assigned to it and the build() method has already been called when the transferred model was being trained.

Useful References:

How does Tensorflow or Keras handle model weight inititialization and when does it happen?

2 Answers2