4

I would like to perform some multivariant regression using gaussian process regression as implemented in GPflow using version 2. Installed with pip install gpflow==2.0.0rc1

Below is some example code that generates some 2D data and then attempts to fit it with using GPR and the finally computes the difference between the true input data and the GPR prediction.

Eventually I would like to extend to higher dimensions and do tests against a validation set to check for over-fitting and experiment with other kernels and "Automatic Relevance Determination" but understanding how to get this to work is the first step.

Thanks!

The following code snippet will work in a jupyter notebook.

import gpflow
import numpy as np
import matplotlib
from gpflow.utilities import print_summary

%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (12, 6)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def gen_data(X, Y):
    """
    make some fake data.
    X, Y are np.ndarrays with shape (N,) where
    N is the number of samples.
    """

    ys = []
    for x0, x1 in zip(X,Y):
        y = x0 * np.sin(x0*10)
        y = x1 * np.sin(x0*10)
        y += 1
        ys.append(y)
    return np.array(ys)


# generate some fake data
x = np.linspace(0, 1, 20)
X, Y = np.meshgrid(x, x)

X = X.ravel()
Y = Y.ravel()

z = gen_data(X, Y)

#note X.shape, Y.shape and z.shape
#are all (400,) for this case.

# if you would like to plot the data you can do the following
fig = plt.figure()
ax = Axes3D(fig)
ax.scatter(X, Y, z, s=100, c='k')


# had to set this 
# to avoid the following error
# tensorflow.python.framework.errors_impl.InvalidArgumentError: Cholesky decomposition was not successful. The input might not be valid. [Op:Cholesky]
gpflow.config.set_default_positive_minimum(1e-7)

# setup the kernel

k = gpflow.kernels.Matern52()


# set up GPR model

# I think the shape of the independent data
# should be (400, 2) for this case
XY = np.column_stack([[X, Y]]).T
print(XY.shape) # this will be (400, 2)

m = gpflow.models.GPR(data=(XY, z), kernel=k, mean_function=None)

# optimise hyper-parameters
opt = gpflow.optimizers.Scipy()

def objective_closure():
    return - m.log_marginal_likelihood()

opt_logs = opt.minimize(objective_closure,
                        m.trainable_variables,
                        options=dict(maxiter=100)
                       )


# predict training set
mean, var = m.predict_f(XY)

print(mean.numpy().shape)
# (400, 400)
# I would expect this to be (400,)

# If it was then I could compute the difference
# between the true data and the GPR prediction
# `diff = mean - z`
# but because the shape is not as expected this of course
# won't work.


joel
  • 6,359
  • 2
  • 30
  • 55
cyberface
  • 91
  • 5
  • For future reference a possible solution is to change the line from `z = gen_data(X, Y)` to `z = gen_data(X, Y); z = z.reshape(-1, 1)` thanks for the help! – cyberface Feb 10 '20 at 12:03

1 Answers1

3

The shape of z must be (N, 1), whereas in your case it is (N,). However, this is a missing check in GPflow and not your fault.

joel
  • 6,359
  • 2
  • 30
  • 55
Artem Artemev
  • 516
  • 3
  • 8