Using sklearn voting ensemble with partial fit

Question

Can someone please tell how to use ensembles in sklearn using partial fit. I don't want to retrain my model. Alternatively, can we pass pre-trained models for ensembling ? I have seen that voting classifier for example does not support training using partial fit.

Akanni · Accepted Answer · 2021-03-12T11:58:15.517

16

The Mlxtend library has an implementation of VotingEnsemble which allows you to pass in pre-fitted models. For example if you have three pre-trained models clf1, clf2, clf3. The following code would work.

from mlxtend.classifier import EnsembleVoteClassifier
import copy
eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[1,1,1], fit_base_estimators=False)

When set to false the fit_base_estimators argument in EnsembleVoteClassifier ensures that the classifiers are not refit.

In general, when looking for more advanced technical features that sci-kit learn does not provide, look to mlxtend as a first point of reference.

edited Mar 12 '21 at 11:58

answered Nov 26 '17 at 09:22

Akanni

924
9
10

When I do just this with fit models, I get: `NotFittedError: Estimator not fitted, call 'fit' before exploiting the model.` – Matthew Mar 20 '18 at 19:21
I found that you have to run `eclf.fit` anyway. However, if these classifiers were trained on different training data, what would one be passing into the `eclf.fit` command? – Matthew Mar 20 '18 at 19:44
Hi @Matthew . In the case of EnsembleVoteClassifier with refit=False, `eclf.fit` does not do anything. the `fit` property/method is just standard nomenclature used by the machine learning algorithms in the scikit learn package. I assume that all Classifiers in mlxtend and scikit learn derive from a Base class which requires the use of the fit method. – Akanni Apr 18 '18 at 00:05
1

This answer actually solves the problem, thank you! However, just to get some runnable code, you should add a `from` before your `import`. [Here](https://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/) are the docs, for anyone interested. – Bobson Dugnutt Jun 02 '18 at 15:28
2

Apparently `refit` is now renamed to `fit_base_estimators` – MrObjectOriented Mar 10 '21 at 11:22
@ReeshabhRanjan Thanks for pointing it out, I've corrected it to align to the newest API – Akanni Mar 11 '21 at 11:44
1

@RilwanAdewoyin please also update the text below the code snipped. – MrObjectOriented Mar 11 '21 at 20:08

score 12 · Answer 2 · answered Feb 09 '19 at 20:43

Workaround:

VotingClassifier checks that estimators_ is set in order to understand whether it is fitted, and is using the estimators in estimators_ list for prediction. If you have pre trained classifiers, you can put them in estimators_ directly like the code below.

However, it is also using LabelEnconder, so it assumes labels are like 0,1,2,... and you also need to set le_ and classes_ (see below).

from sklearn.ensemble import VotingClassifier
from sklearn.preprocessing import LabelEncoder

clf_list = [clf1, clf2, clf3]

eclf = VotingClassifier(estimators = [('1' ,clf1), ('2', clf2), ('3', clf3)], voting='soft')

eclf.estimators_ = clf_list
eclf.le_ = LabelEncoder().fit(y)
eclf.classes_ = seclf.le_.classes_

# Now it will work without calling fit
eclf.predict(X,y)

score 8 · Answer 3 · edited Nov 09 '17 at 18:59

Unfortunately, currently this is not possible in scikit VotingClassifier.

But you can use http://sebastianraschka.com/Articles/2014_ensemble_classifier.html (from which VotingClassifer is implemented) to try and implement your own voting classifier which can take pre-fitted models.

Also we can look at the source code here and modify it to our use:

from sklearn.preprocessing import LabelEncoder
import numpy as np

le_ = LabelEncoder()

# When you do partial_fit, the first fit of any classifier requires 
all available labels (output classes), 
you should supply all same labels here in y.
le_.fit(y)

# Fill below list with fitted or partial fitted estimators
clf_list = [clf1, clf2, clf3, ... ]

# Fill weights -> array-like, shape = [n_classifiers] or None
weights = [clf1_wgt, clf2_wgt, ... ]
weights = None

#For hard voting:
pred = np.asarray([clf.predict(X) for clf in clf_list]).T
pred = np.apply_along_axis(lambda x:
                           np.argmax(np.bincount(x, weights=weights)),
                           axis=1,
                           arr=pred.astype('int'))

#For soft voting:
pred = np.asarray([clf.predict_proba(X) for clf in clf_list])
pred = np.average(pred, axis=0, weights=weights)
pred = np.argmax(pred, axis=1)

#Finally, reverse transform the labels for correct output:
pred = le_.inverse_transform(np.argmax(pred, axis=1))

score 4 · Answer 4 · answered May 11 '18 at 15:09

It's not too hard to implement the voting. Here's my implementation:

import numpy as np 

class VotingClassifier(object):
    """ Implements a voting classifier for pre-trained classifiers"""

    def __init__(self, estimators):
        self.estimators = estimators

    def predict(self, X):
        # get values
        Y = np.zeros([X.shape[0], len(self.estimators)], dtype=int)
        for i, clf in enumerate(self.estimators):
            Y[:, i] = clf.predict(X)
        # apply voting 
        y = np.zeros(X.shape[0])
        for i in range(X.shape[0]):
            y[i] = np.argmax(np.bincount(Y[i,:]))
        return y

score 2 · Answer 5 · answered Jul 12 '18 at 16:16

The Mlxtend library has an implementation works, you still need to call the fit function for the EnsembleVoteClassifier. Seems the fit function doesn't really modify any parameters rather checking the possible label values. In the example below, you have to give an array contains all the possible values appear in original y(in this case 1,2) to eclf2.fit It doesn't matter for X.

import numpy as np
from mlxtend.classifier import EnsembleVoteClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
import copy
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])

for clf in (clf1, clf2, clf3):
    clf.fit(X, y)    
eclf2 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3],voting="soft",refit=False)
eclf2.fit(None,np.array([1,2]))
print(eclf2.predict(X))

Using sklearn voting ensemble with partial fit

5 Answers5

Linked