I am trying to run cross_val_score
in sklearn
with a split supplied by me. The sklearn
documentation gives here the following example:
>>> from sklearn.model_selection import PredefinedSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([0, 0, 1, 1])
>>> test_fold = [0, 1, -1, 1]
>>> ps = PredefinedSplit(test_fold)
>>> ps.get_n_splits()
2
>>> print(ps)
PredefinedSplit(test_fold=array([ 0, 1, -1, 1]))
>>> for train_index, test_index in ps.split():
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [1 2 3] TEST: [0]
TRAIN: [0 2] TEST: [1 3]
I am having troubles with understanding this example. In particular,
- how does why does
ps.get_n_splits()
return 2 in this example; and - why does the
test_fold
array lead to the splits shown at the bottom of the code snippet?
Additionally, I would like to ask, in this case, if I pass the ps object to the cross_val_score
function in sklearn
, will it perform cross validation with these two splits?