1

Basic cross validation:

from sklearn.model_selection import cross_val_score
from sklearn import datasets

X, y = datasets.load_iris(return_X_y=True)
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, X, y, cv=5)

Suppose there is another data X2 andy2 which I would like to concatenate with X and y but I don't want to participate it in cross validation.(In all 5 folds X2 and y2 should be a part of training).

Is it still possible to use cross_val_score from scikit-learn to do so?

In another words, is partial cross validation possible in cross_val_score where a part of data always remains in training set?

P.S: X2 and y2 are actually synthesized complementary data which I would like to know weather their presence help the model to perform better or not. So for fair comparison they shouldn't be a part of testing.

Sajad.sni
  • 187
  • 4
  • 15
  • No way with `cross_val_score`; but you can easily do it manually - see [this answer](https://stackoverflow.com/questions/54201464/cross-validation-metrics-in-scikit-learn-for-each-data-split/54202609#54202609) of mine, and append the validation data `X[val_index]` and `y[val_index]` in each fold with `X2` and `y2` respectively. – desertnaut Sep 19 '20 at 13:35
  • thank you for your response. using `KFold` is actually an acceptable solution. but since it is performed in a for loop, I think there should be a more optimal solution to avoid it. I found cv argument in `cross_val_score` can be passed a function. I worked around it but got wierd results! don't you think it can be a good way to get passed this problem? – Sajad.sni Sep 21 '20 at 07:12

0 Answers0