1

In this other question, it was shown that one can reuse a Dense layer on different Input layers to enable weight sharing. I am now wondering how to extend this principle to an entire block of layers; my attempt is as follows:

from keras.layers import Input, Dense, BatchNormalization, PReLU
from keras.initializers import Constant
from keras import backend as K

def embedding_block(dim):
    dense = Dense(dim, activation=None, kernel_initializer='glorot_normal')
    activ = PReLU(alpha_initializer=Constant(value=0.25))(dense)
    bnorm = BatchNormalization()(activ)
    return bnorm
def embedding_stack():
    return embedding_block(32)(embedding_block(16)(embedding_block(8)))
                                                                           
common_embedding = embedding_stack()

Here I am creating "embedding blocks" with a single dense layer of variable dimension which I'm trying to string together into an "embedding stack", made of blocks with increasing dimension. Then I would like to apply this "common embedding" to several Input layers (that all have the same shape) such that the weights are shared.

The above code fails with

<ipython-input-33-835f06ed7bbb> in embedding_block(dim)
      1 def embedding_block(dim):
      2     dense = Dense(dim, activation=None, kernel_initializer='glorot_normal')
----> 3     activ = PReLU(alpha_initializer=Constant(value=0.25))(dense)
      4     bnorm = BatchNormalization()(activ)
      5     return bnorm

/localenv/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py in __call__(self, *args, **kwargs)
    980       with ops.name_scope_v2(name_scope):
    981         if not self.built:
--> 982           self._maybe_build(inputs)
    983 
    984         with ops.enable_auto_cast_variables(self._compute_dtype_object):

/localenv/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py in _maybe_build(self, inputs)
   2641         # operations.
   2642         with tf_utils.maybe_init_scope(self):
-> 2643           self.build(input_shapes)  # pylint:disable=not-callable
   2644       # We must set also ensure that the layer is marked as built, and the build
   2645       # shape is stored since user defined build functions may not be calling

/localenv/lib/python3.8/site-packages/tensorflow/python/keras/utils/tf_utils.py in wrapper(instance, input_shape)
    321     if input_shape is not None:
    322       input_shape = convert_shapes(input_shape, to_tuples=True)
--> 323     output_shape = fn(instance, input_shape)
    324     # Return shapes from `fn` as TensorShapes.
    325     if output_shape is not None:

/localenv/lib/python3.8/site-packages/tensorflow/python/keras/layers/advanced_activations.py in build(self, input_shape)
    138   @tf_utils.shape_type_conversion
    139   def build(self, input_shape):
--> 140     param_shape = list(input_shape[1:])
    141     if self.shared_axes is not None:
    142       for i in self.shared_axes:

TypeError: 'NoneType' object is not subscriptable

What is the proper way to do this? Thanks!

Demosthene
  • 359
  • 1
  • 4
  • 16
  • You can extend `tf.keras.Layer` and define the "sublayers" within. Then define the shared weights variable in your extended Layer and replace the weights of your "sublayers". – Sebastian R. Dec 10 '20 at 11:39
  • Is it like you want to build a common block and use it in various stage of your model by invoking with different parameters (i.e `kernel_size`, `stride`, `padding` etc)? – Innat Dec 10 '20 at 12:03
  • Hi @SebastianR. ah I was hoping to do it without having to redefine anything... :/ – Demosthene Dec 10 '20 at 12:32
  • Hi @M.Innat the embedding_stack() function is just here for convenience - I want to define one such stack for the model, and pass different input layers to it such that the weights are shared. I can then collect the output from the embedding of each input, concatenate them, and go do something with that. But the embedding stack itself is trained on all inputs together with one set of weights. Does that make sense? – Demosthene Dec 10 '20 at 12:35
  • I think I don't get it fully. However, would you please add a diagram of what you want? That would be much more clear. And you can check [this blog](https://towardsdatascience.com/model-sub-classing-and-custom-training-loop-from-scratch-in-tensorflow-2-cc1d4f10fb4e), I've written it a few days ago. Please see the building block of a small inception model, is it something you want? – Innat Dec 10 '20 at 12:52
  • @M.Innat ok here's an example: I want to have a common block BatchNorm(PReLU(Dense(8)(input))) applied to two Input heads of same dimension=4. I can then concatenate the two output tensors with dimension 8 each and pass them to the rest of the network (e.g. a Dense(16) layer). This is easy to do if I define two such BatchNorm(PReLU(Dense(8)(input))) sequences, one for each input (it's basically just concatenating two separate models). What I want to do, is have the weights _shared_ between these two branches. Is that clearer? – Demosthene Dec 10 '20 at 13:03

1 Answers1

1

You need to dissociate layers instantiation from the model creation.

Here is a simple method using a for loop :

from tensorflow.keras import layers, initializers, Model

def embedding_block(dim):
    dense = layers.Dense(dim, activation=None, kernel_initializer='glorot_normal')
    activ = layers.PReLU(alpha_initializer=initializers.Constant(value=0.25))
    bnorm = layers.BatchNormalization()
    return [dense, activ, bnorm]

stack = embedding_block(8) + embedding_block(16) + embedding_block(32)

inp1 = layers.Input((5,))
inp2 = layers.Input((5,))

x,y = inp1,inp2
for layer in stack:
    x = layer(x)
    y = layer(y)

concat_layer = layers.Concatenate()([x,y])
pred = layers.Dense(1, activation="sigmoid")(concat_layer)

model = Model(inputs = [inp1, inp2], outputs=pred)

We first create each layer, and then iterate through them using the functional API to create the model.

You can analyze the network in netron to see that the weights are indeed shared :

Netron visualization of the network. There is two inputs at the top using the same layers

Lescurel
  • 10,749
  • 16
  • 39