So this question also bugged me and while the other's made good points, they didn't answer all aspects of OP's question.
The true answer is: The divergence in scores for increasing k is due to the chosen metric R2 (coefficient of determination). For e.g. MSE, MSLE or MAE there won't be any difference in using cross_val_score
or cross_val_predict
.
See the definition of R2:
R^2 = 1 - (MSE(ground truth, prediction)/ MSE(ground truth, mean(ground truth)))
The bold part explains why the score starts to differ for increasing k: the more splits we have, the fewer samples in the test fold and the higher the variance in the mean of the test fold.
Conversely, for small k, the mean of the test fold won't differ much of the full ground truth mean, as sample size is still large enough to have small variance.
Proof:
import numpy as np
from sklearn.metrics import mean_absolute_error as mae
from sklearn.metrics import mean_squared_log_error as msle, r2_score
predictions = np.random.rand(1000)*100
groundtruth = np.random.rand(1000)*20
def scores_for_increasing_k(score_func):
skewed_score = score_func(groundtruth, predictions)
print(f'skewed score (from cross_val_predict): {skewed_score}')
for k in (2,4,5,10,20,50,100,200,250):
fold_preds = np.split(predictions, k)
fold_gtruth = np.split(groundtruth, k)
correct_score = np.mean([score_func(g, p) for g,p in zip(fold_gtruth, fold_preds)])
print(f'correct CV for k={k}: {correct_score}')
for name, score in [('MAE', mae), ('MSLE', msle), ('R2', r2_score)]:
print(name)
scores_for_increasing_k(score)
print()
Output will be:
MAE
skewed score (from cross_val_predict): 42.25333901481263
correct CV for k=2: 42.25333901481264
correct CV for k=4: 42.25333901481264
correct CV for k=5: 42.25333901481264
correct CV for k=10: 42.25333901481264
correct CV for k=20: 42.25333901481264
correct CV for k=50: 42.25333901481264
correct CV for k=100: 42.25333901481264
correct CV for k=200: 42.25333901481264
correct CV for k=250: 42.25333901481264
MSLE
skewed score (from cross_val_predict): 3.5252449697327175
correct CV for k=2: 3.525244969732718
correct CV for k=4: 3.525244969732718
correct CV for k=5: 3.525244969732718
correct CV for k=10: 3.525244969732718
correct CV for k=20: 3.525244969732718
correct CV for k=50: 3.5252449697327175
correct CV for k=100: 3.5252449697327175
correct CV for k=200: 3.5252449697327175
correct CV for k=250: 3.5252449697327175
R2
skewed score (from cross_val_predict): -74.5910282783694
correct CV for k=2: -74.63582817089443
correct CV for k=4: -74.73848598638291
correct CV for k=5: -75.06145142821893
correct CV for k=10: -75.38967601572112
correct CV for k=20: -77.20560102267272
correct CV for k=50: -81.28604960074824
correct CV for k=100: -95.1061197684949
correct CV for k=200: -144.90258384605787
correct CV for k=250: -210.13375041871123
Of course, there is another effect not shown here, which was mentioned by others.
With increasing k, there are more models trained on more samples and validated on fewer samples, which will effect the final scores, but this is not induced by the choice between cross_val_score
and cross_val_predict
.