1
class ConstLayer(tf.keras.layers.Layer):
    def __init__(self, x, **kwargs):
        super(ConstLayer, self).__init__(**kwargs)
        self.x = tf.Variable(x, trainable=False)

    def call(self, input):
        return self.x

    def get_config(self):
        #Note: all original model has eager execution disabled
        config = super(ConstLayer, self).get_config()
        config['x'] = self.x
        return config
    


model_test_const_layer = keras.Sequential([
    keras.Input(shape=(784)),
    ConstLayer([[1.,1.]], name="anchors"),
    keras.layers.Dense(10),
])

model_test_const_layer.summary()
model_test_const_layer.save("../models/my_model_test_constlayer.h5")
del model_test_const_layer
model_test_const_layer = keras.models.load_model("../models/my_model_test_constlayer.h5",custom_objects={'ConstLayer': ConstLayer,})
model_test_const_layer.summary()

This code is a sandbox replication of an error given by a larger Keras model with a RESNet 101 backbone.

Errors: If the model includes the custom layer ConstLayer:

  • without this line: config['x'] = self.x error when loading the saved model with keras.models.load_model: TypeError: __init__() missing 1 required positional argument: 'x'

  • with config['x'] = self.x error: NotImplementedError: deepcopy() is only available when eager execution is enabled. Note: The larger model, requires eager execution disabled tf.compat.v1.disable_eager_execution()

Any help and clues are greatly appreciated!

Mihai.Mehe
  • 448
  • 8
  • 13

1 Answers1

1

As far as I understand it, TF has problems with copying variables. Just save the original value / config passed to the layer instead:

import tensorflow as tf
import tensorflow.keras as keras

tf.compat.v1.disable_eager_execution()

class ConstLayer(tf.keras.layers.Layer):
    def __init__(self, x, **kwargs):
        super(ConstLayer, self).__init__(**kwargs)
        self._config = {'x': x}
        self.x = tf.Variable(x, trainable=False)

    def call(self, input):
        return self.x

    def get_config(self):
        #Note: all original model has eager execution disabled
        config = {
            **super(ConstLayer, self).get_config(),
            **self._config
        }
        return config


model_test_const_layer = keras.Sequential([
    keras.Input(shape=(784)),
    ConstLayer([[1., 1.]], name="anchors"),
    keras.layers.Dense(10),
])

model_test_const_layer.summary()
model_test_const_layer.save("../models/my_model_test_constlayer.h5")
del model_test_const_layer
model_test_const_layer = keras.models.load_model(
    "../models/my_model_test_constlayer.h5", custom_objects={'ConstLayer': ConstLayer, })
model_test_const_layer.summary()
Plagon
  • 2,689
  • 1
  • 11
  • 23
  • The model is saving, yet changing the custom ConstLayer in the larger original model makes the training stop before Epoch 1. The model compiles. Building the layer from config is giving an error. For example building a test layer and reloading it as: ```cl_test = ConstLayer([[1.,1.]])#, name="anchors") config_test_cl = cl_test.get_config() print(config_test_cl) cl_test_ = ConstLayer.from_config(**config_test_cl)```. TypeError: from_config() got an unexpected keyword argument 'name' – Mihai.Mehe Jan 09 '23 at 22:29
  • 1
    `from_config` expects a `dict`, use `cl_test_ = ConstLayer.from_config(config_test_cl)` – Plagon Jan 09 '23 at 22:43
  • This works, thanks. My model still does not want to train. This is new. Must be a different issue related to this ConstLayer. If I remove `**self._config` from get_config() it does train and save but does not load. It needs `x`. – Mihai.Mehe Jan 09 '23 at 23:08
  • I tried to train the model from the code above, and it works just fine with check pointing, saving, and loading after training. At least now, it's not reproducible for me. – Plagon Jan 09 '23 at 23:35
  • Great. I am trying to reproduce this behavior in a smaller example. – Mihai.Mehe Jan 10 '23 at 00:33
  • All I can add is that the `ConstLayer([[1., 1.]], name="anchors"),` is actually initialized with an np.array of very larger shape (1,261888,4), not with with `[[1.,1.]]` . Saved on disk this array is of size 4.19Mb. I am not sure this is an issue though, since the model builds and compiles. – Mihai.Mehe Jan 10 '23 at 05:09
  • 1
    For me it still works after changing it to a `np.array`. – Plagon Jan 10 '23 at 05:13
  • Yes, smaller model compiles, saves, loads and trains fine. The extended model hangs, just before the first epoch. Does it matter I am training on GPU? – Mihai.Mehe Jan 10 '23 at 05:42
  • I don't see why this should matter – Plagon Jan 10 '23 at 05:43
  • Opened a discussion on the extended model here https://github.com/matterport/Mask_RCNN/issues/2922 – Mihai.Mehe Jan 10 '23 at 14:16
  • Could this be an issue with the fact that it hangs before Epoch 1? https://stackoverflow.com/questions/53471855/why-does-keras-halt-at-the-first-epoch-when-i-attempt-to-train-it-using-fit-gene – Mihai.Mehe Jan 10 '23 at 15:34