14

In keras, is it possible to share weights between two layers, but to have other parameters differ? Consider the following (admittedly a bit contrived) example:

conv1 = Conv2D(64, 3, input_shape=input_shape, padding='same')
conv2 = Conv2D(64, 3, input_shape=input_shape, padding='valid')

Notice that the layers are identical except for the padding. Can I get keras to use the same weights for both? (i.e. also train the network accordingly?)

I've looked at the keras doc, and the section on shared layers seems to imply that sharing works only if the layers are completely identical.

mitchus
  • 4,677
  • 3
  • 35
  • 70
  • Have you tried it? Theoretically, convolution layers with the same kernel size and same product of input-output channels will have weight tensors with the same dimensions. – E_net4 Jul 22 '17 at 10:02
  • @E_net4 Indeed the dimensions match, my question is how to get keras to share the weights (i.e. I would like to try it, I just don't know how :)) – mitchus Jul 22 '17 at 10:11

1 Answers1

19

To my knowledge, this cannot be done by the common "API level" of Keras usage. However, if you dig a bit deeper, there are some (ugly) ways to share the weights.

First of all, the weights of the Conv2D layers are created inside the build() function, by calling add_weight():

    self.kernel = self.add_weight(shape=kernel_shape,
                                  initializer=self.kernel_initializer,
                                  name='kernel',
                                  regularizer=self.kernel_regularizer,
                                  constraint=self.kernel_constraint)

For your provided usage (i.e., default trainable/constraint/regularizer/initializer), add_weight() does nothing special but appending the weight variables to _trainable_weights:

    weight = K.variable(initializer(shape), dtype=dtype, name=name)
    ...
        self._trainable_weights.append(weight)

Finally, since build() is only called inside __call__() if the layer hasn't been built, shared weights between layers can be created by:

  1. Call conv1.build() to initialize the conv1.kernel and conv1.bias variables to be shared.
  2. Call conv2.build() to initialize the layer.
  3. Replace conv2.kernel and conv2.bias by conv1.kernel and conv1.bias.
  4. Remove conv2.kernel and conv2.bias from conv2._trainable_weights.
  5. Append conv1.kernel and conv1.bias to conv2._trainable_weights.
  6. Finish model definition. Here conv2.__call__() will be called; however, since conv2 has already been built, the weights are not going to be re-initialized.

The following code snippet may be helpful:

def create_shared_weights(conv1, conv2, input_shape):
    with K.name_scope(conv1.name):
        conv1.build(input_shape)
    with K.name_scope(conv2.name):
        conv2.build(input_shape)
    conv2.kernel = conv1.kernel
    conv2.bias = conv1.bias
    conv2._trainable_weights = []
    conv2._trainable_weights.append(conv2.kernel)
    conv2._trainable_weights.append(conv2.bias)

# check if weights are successfully shared
input_img = Input(shape=(299, 299, 3))
conv1 = Conv2D(64, 3, padding='same')
conv2 = Conv2D(64, 3, padding='valid')
create_shared_weights(conv1, conv2, input_img._keras_shape)
print(conv2.weights == conv1.weights)  # True

# check if weights are equal after model fitting
left = conv1(input_img)
right = conv2(input_img)
left = GlobalAveragePooling2D()(left)
right = GlobalAveragePooling2D()(right)
merged = concatenate([left, right])
output = Dense(1)(merged)
model = Model(input_img, output)
model.compile(loss='binary_crossentropy', optimizer='adam')

X = np.random.rand(5, 299, 299, 3)
Y = np.random.randint(2, size=5)
model.fit(X, Y)
print([np.all(w1 == w2) for w1, w2 in zip(conv1.get_weights(), conv2.get_weights())])  # [True, True]

One drawback of this hacky weight-sharing is that the weights will not remain shared after model saving/loading. This will not affect prediction, but it may be problematic if you want to load the trained model for further fine-tuning.

Yu-Yang
  • 14,539
  • 2
  • 55
  • 62
  • Interesting! I didn't know it could be done like this. Although the inability to save the sharing is an issue, I would like to be able to finetune the model. – mitchus Jul 23 '17 at 09:02
  • I haven't tried, but I think you can use `model.save_weights()` to save the weight arrays only instead of the entire model object. For finetuning, just recreate the model with the same code, and then call `model.load_weights()`. – Yu-Yang Jul 23 '17 at 11:42
  • Are you sure this works? I get OOM and I can tell from `tensorflow/core/common_runtime/bfc_allocator.cc` that tf is actually trying to allocate multiple copies of the weights – cadama Mar 22 '18 at 23:17
  • My issue was due to custom layers. If those are the weights you are sharing is better to pass them during initialization of the layer. That solved my OOM problem – cadama Mar 23 '18 at 00:07
  • Well if you are writing custom layers then of course you shouldn't use this solution. Re-writing the layer is much cleaner. As I said, this is merely a quick hack. – Yu-Yang Mar 23 '18 at 02:17