I'm using BayesSearchCV
from scikit-optimize
to optimise an XGBoost
model to fit some data I have. While the model fits fine, I am puzzled by the scores provided in the diagnostic information and am unable to replicate them.
Here's an example script using the Boston house prices dataset to illustrate my point:
from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
from xgboost.sklearn import XGBRegressor
from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
from sklearn.model_selection import KFold, train_test_split
boston = load_boston()
# Dataset info:
print(boston.keys())
print(boston.data.shape)
print(boston.feature_names)
print(boston.DESCR)
# Put data into dataframe and label column headers:
data = pd.DataFrame(boston.data)
data.columns = boston.feature_names
# Add target variable to dataframe
data['PRICE'] = boston.target
# Split into X and y
X, y = data.iloc[:, :-1],data.iloc[:,-1]
# Split into training and validation datasets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42, shuffle = True)
# For cross-validation, split training data into 5 folds
xgb_kfold = KFold(n_splits = 5,random_state = 42)
# Run fit
xgb_params = {'n_estimators': Integer(10, 3000, 'uniform'),
'max_depth': Integer(2, 100, 'uniform'),
'subsample': Real(0.25, 1.0, 'uniform'),
'learning_rate': Real(0.0001, 0.5, 'uniform'),
'gamma': Real(0.0001, 1.0, 'uniform'),
'colsample_bytree': Real(0.0001, 1.0, 'uniform'),
'colsample_bylevel': Real(0.0001, 1.0, 'uniform'),
'colsample_bynode': Real(0.0001, 1.0, 'uniform'),
'min_child_weight': Real(1, 6, 'uniform')}
xgb_fit_params = {'early_stopping_rounds': 15, 'eval_metric': 'mae', 'eval_set': [[X_val, y_val]]}
xgb_pipe = XGBRegressor(random_state = 42, objective='reg:squarederror', n_jobs = 10)
xgb_cv = BayesSearchCV(xgb_pipe, xgb_params, cv = xgb_kfold, n_iter = 5, n_jobs = 1, random_state = 42, verbose = 4, scoring = None, fit_params = xgb_fit_params)
xgb_cv.fit(X_train, y_train)
After running this, xgb_cv.best_score_
is 0.816, and xgb_cv.best_index_
is 3. Looking at xgb_cv.cv_results_, I want to find the best scores for each fold:
print(xgb_cv.cv_results_['split0_test_score'][xgb_cv.best_index_], xgb_cv.cv_results_['split1_test_score'][xgb_cv.best_index_], xgb_cv.cv_results_['split2_test_score'][xgb_cv.best_index_], xgb_cv.cv_results_['split3_test_score'][xgb_cv.best_index_], xgb_cv.cv_results_['split4_test_score'][xgb_cv.best_index_])
Which gives:
0.8023562337946979,
0.8337404778903412,
0.861370681263761,
0.8749312273014963,
0.7058815015739375
I'm not sure what's being calculated here, because scoring
is set to None
in my code. XGBoost's documentation isn't much help, but according to xgb_cv.best_estimator_.score?
it's supposed to be the R2 of the predicted values. Anyway, I'm unable to obtain these values when I manually try calculating the score for each fold of the data used in the fit:
# First, need to get the actual indices of the data from each fold:
kfold_indexes = {}
kfold_cnt = 0
for train_index, test_index in xgb_kfold.split(X_train):
kfold_indexes[kfold_cnt] = {'train': train_index, 'test': test_index}
kfold_cnt = kfold_cnt+1
# Next, calculate the score for each fold
for p in range(5): print(xgb_cv.best_estimator_.score(X_train.iloc[kfold_indexes[p]['test']], y_train.iloc[kfold_indexes[p]['test']]))
Which gives me the following:
0.9954929618573786
0.994844803666101
0.9963108152027245
0.9962274544089832
0.9931314653538819
How is BayesSearchCV calculating the scores for each fold, and why can't I replicate them using the score
function? I would be most grateful for any assistance with this issue.
(Also, manually calculating the mean of these scores gives: 0.8156560..., while xgb_cv.best_score_
gives: 0.8159277... Not sure why there's a precision difference here.)