Is there an easy way to list out all sklearn implementations that take sparse training input?

Question

As opposed to just trying them with sparse input and getting an error. Something like the listing of classifiers shown here: AdaBoostClassifier with different base learners

You can use a similar method and ``try`` to fit them with sparse data. — Andreas Mueller, Nov 10 '14 at 16:44

Robert E Mealey · Accepted Answer · 2015-10-04T01:51:19.753

ok, answering my own question for posterity, adapting original post with try as suggested by andreas. Def should have thought of that.

from scipy.sparse import csc_matrix
from sklearn.utils.testing import all_estimators
import numpy as np
import random
y = np.array([random.randrange(0,2) for i in xrange(1000)])
X = csc_matrix(np.array([[random.randrange(0,2) for i in xrange(1000)], 
                         [random.randrange(0,2) for i in xrange(1000)], 
                         [random.randrange(0,2) for i in xrange(1000)]])).T
for name, Clf in all_estimators(type_filter='classifier'):
    try:
        clf = Clf()
        clf.fit(X,y)
        print name
    except:
        pass

which gave this list:

BernoulliNB
DummyClassifier
KNeighborsClassifier
LabelPropagation
LabelSpreading
LinearSVC
LogisticRegression
MultinomialNB
NearestCentroid
NuSVC
PassiveAggressiveClassifier
Perceptron
RadiusNeighborsClassifier
RidgeClassifier
RidgeClassifierCV
SGDClassifier
SVC

I know this is quick and dirty and misses any that fail for errors other than TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array., and just so everyone knows how conscientious I am, the only one that fails for some other reason is EllipticEnvelope. I checked. :) Also, the non-tree based ensemble methods like BaggingClassifier and AdaBoostClassifier can take sparse input if you change the base_estimator from the default to one that can take sparse input and has all the necessary methods/attributes and you use a sparse representation that can be indexed (csr_matrix or csc_matrix).

Is there an easy way to list out all sklearn implementations that take sparse training input?

1 Answers1