Built-in standardization in Keras model

Question

I wonder if it is possible to include standardization of input features directly into a keras model, in a way that it would be automatically included when loading the model with models.load_model? This would avoid the need for carrying around the normalization transformation from the training set when applying the model elsewhere.

I understand a possible solution is to include the Keras model into a scikit-learn pipeline ( see for instance How to insert Keras model into scikit-learn pipeline? ). However I would prefer not to set up a pipeline and ideally just use models.load_model. Are there possible solutions that do not involve using anything other than Keras or Tensorflow?

Another possible solution is simply using a BatchNormalization layer as the first layer in the network. This builds normalization into the network, but during training the initial normalization then depends on the statistics of the (small) batches rather than on the entire training set, and would thus vary in an unnecessary way between training batches.

( A similar question to this was asked earlier ( How to include normalization of features in Keras regression model? ), but neither the question nor answer seemed very clear. )

This is in fact the solution I currently apply. But this is less optimal than a normalization over the full training because of the dependence on the batch statistics. I updated my question to mention this. — W. Verbeke, Nov 29 '19 at 14:33
Honestly I don't see the problem with `BatchNormalization`. It does a great job and usually makes better models. --- The alternative solution would be using a Lambda layer where you manually code the transformations. The downside is that you will always need a `load_model(custom_objects={'lambda_func':lambda_func})` and carry the lambda code around. — Daniel Möller, Nov 29 '19 at 14:52
BatchNormalization is great for reducing internal convariance shift when applied between hidden layers. But it typically does NOT produe better models when used as a replacement for the initial normalization. And even between hidden layers reducing its dependence on limited batch size is known to improve its performance ( see https://arxiv.org/abs/1702.03275). In any case I don't think it was ever intended to be used for this purpose, because it just limits your statistical accuracy for no reason. — W. Verbeke, Nov 29 '19 at 14:54

Built-in standardization in Keras model

0 Answers0