0

I'm new to Gaussian processes and struggling to validate the output of my scikit GPR.

I'm particularly concerned with the fact that my GPR returns a score of 1, which doesn't make any sense to me because the coefficient of determination of this data should not be equal to 1.

Is there a particular problem with the GRP or data that is implied by a score of 1? I've included my code, and my X,Y are each arrays of length 15.

I have additionally tried both the Matern and RBF kernels on their own with default parameters. I get slightly different predictions but still with a score of 1 both times.

gp = gaussian_process.GaussianProcessRegressor(
                     alpha=1e-10,                                 
                     copy_X_train=True,                             
                     kernel = Matern() + 1*RBF(1),                            
                     n_restarts_optimizer=10,                        
                     normalize_y=False,                             
                     random_state=None)

gp.fit(X, Y)
score = gp.score(X, Y)
print(score)
x_pred = np.atleast_2d(np.linspace(0,10,1000)).T
y_pred, sigma = gp.predict(x_pred, return_std=True)

Any advice is appreciated, thanks!

M-Wi
  • 392
  • 2
  • 11
  • It's not unheard of such algorithms to get an R-squared of 1 in the *training* data, as is your case here; the real test is on unseen (test) data. – desertnaut Apr 06 '20 at 20:11
  • @desertnaut forgive the naivety, but if I only have one set of data what is the process I'm supposed to go about to make predictions? I think I'm fundamentally missing the point about training vs testing which might be the source of my confusion. – M-Wi Apr 06 '20 at 20:17
  • 1
    see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html ; you train on the train set and evaluate on the test set – desertnaut Apr 06 '20 at 20:19
  • perfect, thank you! – M-Wi Apr 06 '20 at 20:23
  • 1
    You may want to know that, in general, use of R-squared is not recommended in modern ML; see the last part of own answer here: https://stackoverflow.com/questions/54614157/scikit-learn-statsmodels-which-r-squared-is-correct/54618898#54618898 – desertnaut Apr 06 '20 at 20:26

0 Answers0