3

In this notebook https://nbviewer.jupyter.org/github/krasserm/bayesian-machine-learning/blob/master/bayesian_neural_networks.ipynb, the author defines the function

def mixture_prior_params(sigma_1, sigma_2, pi):
    params = K.variable([sigma_1, sigma_2, pi], name='mixture_prior_params')
    sigma = np.sqrt(pi * sigma_1 ** 2 + (1 - pi) * sigma_2 ** 2)
    return params, sigma

which creates a variable and returns a tuple. This method is then called

prior_params, prior_sigma = mixture_prior_params(sigma_1=1.0, sigma_2=0.1, pi=0.2)

Then, in the class DenseVariational, which is a custom layer, in the method build, the prior_params global variable is added to the private list _trainable_weights

def build(self, input_shape):
    self._trainable_weights.append(prior_params)
    ...

Why would one need or want to do this? If I attempt to print the trainable parameters of either the custom layer or a model made of this custom layer, for example

# Create the model with DenseVariational layers
model = Model(x_in, x_out)
print("model.trainable_weights =", model.trainable_weights)

I can see that each DenseVariational layer contains a mixture_prior_params trainable parameter. Why should one declare mixture_prior_params, more specifically, sigma_1, sigma_2 and pi, outside of the layer, if they are trainable parameters of the layer?

nbro
  • 15,395
  • 32
  • 113
  • 196

1 Answers1

1

After having looked at this question Can I share weights between keras layers but have other parameters differ? and its answer (https://stackoverflow.com/a/45258859/3924118) and having printed the values of the trainable variables of the model after the model has been trained, it seems like this is a way of sharing a variable across different layers, given that the value of that variable seems to be equal across layers, after the model has been trained.

I have created a simple example (with TensorFlow 2.0.0 and Keras 2.3.1) that shows this

import numpy as np
from keras import activations, initializers
from keras import backend as K
from keras import optimizers
from keras.layers import Input
from keras.layers import Layer
from keras.models import Model

shared_variable = K.variable([0.3], name='my_shared_variable')


class MyLayer(Layer):
    def __init__(self, output_dim, activation=None, **kwargs):
        self.output_dim = output_dim
        self.activation = activations.get(activation)
        super().__init__(**kwargs)

    def build(self, input_shape):
        self._trainable_weights.append(shared_variable)
        self.my_weight = self.add_weight(name='my_weight',
                                         shape=(input_shape[1], self.output_dim),
                                         initializer=initializers.normal(),
                                         trainable=True)
        super().build(input_shape)

    def call(self, x):
        return self.activation(K.dot(x, self.my_weight * shared_variable))

    def compute_output_shape(self, input_shape):
        return input_shape[0], self.output_dim


if __name__ == "__main__":
    # Define the architecture of the model.
    x_in = Input(shape=(1,))
    h1 = MyLayer(20, activation='relu')(x_in)
    h2 = MyLayer(20, activation='relu')(h1)
    x_out = MyLayer(1)(h2)

    model = Model(x_in, x_out)
    print("h1.trainable_weights (before training) =", model.layers[1].trainable_weights[0])
    print("h2.trainable_weights (before training) =", model.layers[2].trainable_weights[0])

    # Prepare the model for training.
    model.compile(loss="mse", optimizer=optimizers.Adam(lr=0.03))

    # Generate dataset.
    X = np.linspace(-0.5, 0.5, 100).reshape(-1, 1)
    y = 10 * np.sin(2 * np.pi * X)

    # Train the model.
    model.fit(X, y, batch_size=1, epochs=100, verbose=0)

    print("h1.trainable_weights (after training) =", model.layers[1].trainable_weights[0])
    print("h2.trainable_weights (after training) =", model.layers[2].trainable_weights[0])

The output is

h1.trainable_weights (before training) = <tf.Variable 'my_shared_variable:0' shape=(1,) dtype=float32, numpy=array([0.3], dtype=float32)>
h2.trainable_weights (before training) = <tf.Variable 'my_shared_variable:0' shape=(1,) dtype=float32, numpy=array([0.3], dtype=float32)>
h1.trainable_weights (after training) = <tf.Variable 'my_shared_variable:0' shape=(1,) dtype=float32, numpy=array([0.7049409], dtype=float32)>
h2.trainable_weights (after training) = <tf.Variable 'my_shared_variable:0' shape=(1,) dtype=float32, numpy=array([0.7049409], dtype=float32)>
nbro
  • 15,395
  • 32
  • 113
  • 196