Bounding hyperparameter optimization with Tensorflow bijector chain in GPflow 2.0

Question

While doing GP regression in GPflow 2.0, I want to set hard bounds on lengthscale (i.e. limiting lengthscale optimization range). Following this thread (Setting hyperparameter optimization bounds in GPflow 2.0), I constructed a TensorFlow Bijector chain (see bounded_lengthscale function below). However, the bijector chain below does not prevent the model from optimizing outside the supposed bounds. What do I need to change to make the bounded_lengthscale function put hard bounds on optimization?

Below is the MRE:

import gpflow 
import numpy as np
from gpflow.utilities import print_summary
import tensorflow as tf
from tensorflow_probability import bijectors as tfb

# Noisy training data
noise = 0.3
X = np.arange(-3, 4, 1).reshape(-1, 1).astype('float64')
Y = (np.sin(X) + noise * np.random.randn(*X.shape)).reshape(-1,1)

def bounded_lengthscale(low, high, lengthscale):
    """Returns lengthscale Parameter with optimization bounds."""
    affine = tfb.AffineScalar(shift=low, scale=high-low)
    sigmoid = tfb.Sigmoid()
    logistic = tfb.Chain([affine, sigmoid])
    parameter = gpflow.Parameter(lengthscale, transform=logistic, dtype=tf.float32)
    parameter = tf.cast(parameter, dtype=tf.float64)
    return parameter

# build GPR model
k = gpflow.kernels.Matern52()
m = gpflow.models.GPR(data=(X, Y), kernel=k)

m.kernel.lengthscale.assign(bounded_lengthscale(0, 1, 0.5))

print_summary(m)

# train model
@tf.function(autograph=False)
def objective_closure():
    return - m.log_marginal_likelihood()

opt = gpflow.optimizers.Scipy()
opt_logs = opt.minimize(objective_closure,
                        m.trainable_variables)
print_summary(m)

Thanks!

Version 0.8.0 of tensorflow_probability does not have the Shift/Scale bijectors, instead you need to use `Affine` or `AffineScalar` (with `shift=` and `scale=` arguments). — STJ, Dec 27 '19 at 22:59
@STJ, You are right. When running the AffineScalar bijector I the following warning: WARNING:tensorflow:From :4: AffineScalar.__init__ (from tensorflow_probability.python.bijectors.affine_scalar) is deprecated and will be removed after 2020-01-01. Instructions for updating: `AffineScalar` bijector is deprecated; please use `tfb.Shift(loc)(tfb.Scale(...))` instead. Is that implying that a future version of tensorflow_probability will have the Shift/Scale bijectors? — Rcameron, Dec 28 '19 at 14:11

STJ · Answer 1 · 2019-12-28T10:32:03.777

In the MWE you assign a new value to a Parameter that is already existing (and does not have the logistic transform). This value is the constrained-space value that the Parameter constructed with logistic transform has, but the transform isn't carried over. Instead, you need to replace the Parameter without logistic transform with one with the transform you want: m.kernel.lengthscale = bounded_lengthscale(0,1,0.5).

Note that the object that you assign to the kernel.lengthscale attribute must be a Parameter instance; if you assign the return value of tf.cast(parameter) as in the MWE this is equivalent to a constant, and it won't actually be optimised!

Simply temoving the tf.cast in the MWE in this question won't immediately work due to float32/float64 mismatch. To fix it, the AffineScalar bijector needs to be in float64; it does not have a dtype argument, instead cast the arguments to shift= and scale= to the required type:

def bounded_lengthscale(low, high, lengthscale):
    """Make lengthscale tfp Parameter with optimization bounds."""
    affine = tfb.AffineScalar(shift=tf.cast(low, tf.float64),
                              scale=tf.cast(high-low, tf.float64))
    sigmoid = tfb.Sigmoid()
    logistic = tfb.Chain([affine, sigmoid])
    parameter = gpflow.Parameter(lengthscale, transform=logistic, dtype=tf.float64)
    return parameter

m.kernel.lengthscale = bounded_lengthscale(0, 1, 0.5)

(GPflow should probably contain a helper function like this to make bounded parameter transforms easier to use - GPflow always appreciates people helping out, so if you want to turn this into a pull request, please do!)

may there be stars in your crown. This is extremely helpful and solves my problem. Thank you. I've never submitted a pull request but am open to it. Where would you recommend the helper function be added in the GPflow source code? — Rcameron, Dec 28 '19 at 14:26
@Rcameron in [gpflow.utilities.bijectors](https://github.com/GPflow/GPflow/blob/develop/gpflow/utilities/bijectors.py), which already has `positive` and `triangular` utilities. — STJ, Dec 28 '19 at 14:41
FYI we have a change going into `tfb.Sigmoid` soon that will allow you to provide `low` and `high` parameters and do this with better numerical stability. — Brian Patton, Jan 02 '20 at 17:35

score 3 · Accepted Answer · answered Jun 28 '21 at 14:57

tfb.Sigmoid now accepts low and high parameters, as @Brian Patton forecasted in a comment.

Therefore, the code can be simplified to :

from tensorflow_probability import bijectors as tfb

def bounded_lengthscale(low, high, lengthscale):
    """Make lengthscale tfp Parameter with optimization bounds."""
    sigmoid = tfb.Sigmoid(low, high)
    parameter = gpflow.Parameter(lengthscale, transform=sigmoid, dtype='float32')
    return parameter

m.kernel.lengthscale = bounded_lengthscale(0, 1, 0.5)

Beware, you may need to cast `low`, `high` values and `dtype` parameter to double for tensorflow. — C. Tim, Jun 28 '21 at 16:29

Bounding hyperparameter optimization with Tensorflow bijector chain in GPflow 2.0

2 Answers2

Linked