I'm studying some cross_validation scores on my dataset using cross_val_score
and KFold
In particular my code looks like this:
cross_val_score(estimator=model, X=X, y=y, scoring='r2', cv=KFold(shuffle=True))
My question is if it's a common behaviour to put shuffle=True
inside the KFold
: if I do it the returns on the r2 score are:
[0.5934, 0.60432, 0.45689, 0.6875, 0.5678]
If I put shuffle=False
it returns
[0.3987, 0,4576, 0.3234, 0.4567. 0.3233]
I would not want that same points that are used for training on an iteration would be then reconsidered for the next, ending up with an optimistic score for the cross validation.. How should I explain that I get better scores using shuffle=True
?