I'm trying to perform Stratified K Fold Validation in python, and I read the following in the documentation:
I'm not exactly sure what this means. Could someone explain to me when exactly does cross_val_score use the StratifiedKFold strategy?
I'm trying to perform Stratified K Fold Validation in python, and I read the following in the documentation:
I'm not exactly sure what this means. Could someone explain to me when exactly does cross_val_score use the StratifiedKFold strategy?
When you are performing cross-fold validation, you are splitting up your training set into multiple validation sets. StratifiedKFold ensures that each of your validation sets contains an equal proportion of the labels from your original training set.
For example, let's say you are training a classifier on spam and not spam. Your training set contains 50k samples with 10k spam samples. If you perform 5-fold cross-fold validation, you will split up your training set into 5 validations of size 10k samples each. With stratification, each of your validation sets will be selected in a manner to maintain the 4:1 distribution of not spam to spam.
EDIT: I'm sorry I misunderstood your original question. To expand upon user @unutbu's comments below, you want to confirm that the classifier you are using is a subclass of the base class ClassifierMixin
. You can do so using a Method Resolution Order
.
Suppose you were using the classifier KNeighborsClassifier
:
>>> from sklearn.neighbors import KNeighborsClassifier
>>> clf = KNeighborsClassifier()
>>> type(clf)
<class 'sklearn.neighbors.classification.KNeighborsClassifier'>
>>> type(clf).mro()
[<class 'sklearn.neighbors.classification.KNeighborsClassifier'>, ..., <class 'sklearn.base.ClassifierMixin'>, <type 'object'>]
Notice that the second to last class in the resolution order is ClassifierMixin
.