Gaussian Process Regression incremental learning

Question

I am using the scikit-learn implementation of Gaussian Process Regression here and I want to fit single points instead of fitting a whole set of points. But the resulting alpha coefficients should remain the same e.g.

gpr2 = GaussianProcessRegressor()
    for i in range(x.shape[0]):
        gpr2.fit(x[i], y[i])

should be the same as

gpr = GaussianProcessRegressor().fit(x, y)

But when accessing gpr2.alpha_ and gpr.alpha_, they are not the same. Why is that?

Indeed, I am working on a project where new data points arise. I dont want to append the x, y arrays and fit on the whole dataset again as it is very time intense. Let x be of size n, then I am having:

n+(n-1)+(n-2)+...+1 € O(n^2) fittings

when considering that the fitting itself is quadratic (correct me if I'm wrong), the run time complexity should be in O(n^3). It would be more optimal, if I do a single fitting on n points:

1+1+...+1 = n € O(n)

desertnaut · Accepted Answer · 2021-03-03T15:50:12.800

What you refer to is actually called online learning or incremental learning; it is in itself a huge sub-field in machine learning, and is not available out-of-the-box for all scikit-learn models. Quoting from the relevant documentation:

Although not all algorithms can learn incrementally (i.e. without seeing all the instances at once), all estimators implementing the partial_fit API are candidates. Actually, the ability to learn incrementally from a mini-batch of instances (sometimes called “online learning”) is key to out-of-core learning as it guarantees that at any given time there will be only a small amount of instances in the main memory.

Following this excerpt in the linked document above, there is a complete list of all scikit-learn models currently supporting incremental learning, from where you can see that GaussianProcessRegressor is not one of them.

Yes, sure! I have found a related github link. I'll leave it here for anyone who will stumble across the same question: https://github.com/Bigpig4396/Incremental-Gaussian-Process-Regression-IGPR — Manh Khôi Duong, Apr 05 '20 at 02:52

score 3 · Answer 2 · answered Mar 24 '21 at 11:08

Although sklearn.gaussian_process.GaussianProcessRegressor does not directly implement incremental learning, it is not necessary to fully retrain your model from scratch.

To fully understand how this works, you should understand the GPR fundamentals. The key idea is that training a GPR model mainly consists of optimising the kernel parameters to minimise some objective function (the log-marginal likelihood by default). When using the same kernel on similar data these parameters can be reused. Since the optimiser has a stopping condition based on convergence, reoptimisation can be sped up by initialising the parameters with pre-trained values (a so-called warm-start).

Below is an example based on the one in the sklearn docs.

from time import time
from sklearn.datasets import make_friedman2
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel
X, y = make_friedman2(n_samples=1000, noise=0.1, random_state=0)
kernel = DotProduct() + WhiteKernel()

start = time()
gpr = GaussianProcessRegressor(kernel=kernel,
        random_state=0).fit(X, y)
print(f'Time: {time()-start:.3f}')
# Time: 4.851
print(gpr.score(X, y))
# 0.3530096529277589

# the kernel is copied to the regressor so we need to 
# retieve the trained parameters
kernel.set_params(**(gpr.kernel_.get_params()))

# use slightly different data
X, y = make_friedman2(n_samples=1000, noise=0.1, random_state=1)

# note we do not train this model
start = time()
gpr2 = GaussianProcessRegressor(kernel=kernel,
        random_state=0).fit(X, y)
print(f'Time: {time()-start:.3f}')
# Time: 1.661
print(gpr2.score(X, y))
# 0.38599549162834046

You can see retraining can be done in significantly less time than training from scratch. Although this might not be fully incremental, it can help speed up training in a setting with streaming data.

Yes, it can be done by optimizing the log marginal likelihood but it is a better practice to simply finding the inverse of the kernel matrix when training. I used the block matrix inverse and Sherman Morrison formula to incrementally learn new data points. Numerically optimizing the likelihood yields more inaccurate results in this case. Further, optimizing the likelihood for data streams is not necessarily computationally more cheap as the optimizer might not necessarily converge faster. — Manh Khôi Duong, Mar 25 '21 at 12:22
I just now noticed @manh-khôi-duong 's comment on the answer by desertnaut. It seems like a good idea to post that comment as an answer to this question. — Gilles Ottervanger, Mar 26 '21 at 13:11

Gaussian Process Regression incremental learning

2 Answers2

Linked

Related