Although sklearn.gaussian_process.GaussianProcessRegressor
does not directly implement incremental learning, it is not necessary to fully retrain your model from scratch.
To fully understand how this works, you should understand the GPR fundamentals. The key idea is that training a GPR model mainly consists of optimising the kernel parameters to minimise some objective function (the log-marginal likelihood by default). When using the same kernel on similar data these parameters can be reused. Since the optimiser has a stopping condition based on convergence, reoptimisation can be sped up by initialising the parameters with pre-trained values (a so-called warm-start).
Below is an example based on the one in the sklearn docs.
from time import time
from sklearn.datasets import make_friedman2
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel
X, y = make_friedman2(n_samples=1000, noise=0.1, random_state=0)
kernel = DotProduct() + WhiteKernel()
start = time()
gpr = GaussianProcessRegressor(kernel=kernel,
random_state=0).fit(X, y)
print(f'Time: {time()-start:.3f}')
# Time: 4.851
print(gpr.score(X, y))
# 0.3530096529277589
# the kernel is copied to the regressor so we need to
# retieve the trained parameters
kernel.set_params(**(gpr.kernel_.get_params()))
# use slightly different data
X, y = make_friedman2(n_samples=1000, noise=0.1, random_state=1)
# note we do not train this model
start = time()
gpr2 = GaussianProcessRegressor(kernel=kernel,
random_state=0).fit(X, y)
print(f'Time: {time()-start:.3f}')
# Time: 1.661
print(gpr2.score(X, y))
# 0.38599549162834046
You can see retraining can be done in significantly less time than training from scratch. Although this might not be fully incremental, it can help speed up training in a setting with streaming data.