1

As opposed to just trying them with sparse input and getting an error. Something like the listing of classifiers shown here: AdaBoostClassifier with different base learners

Community
  • 1
  • 1
Robert E Mealey
  • 506
  • 3
  • 14

1 Answers1

3

ok, answering my own question for posterity, adapting original post with try as suggested by andreas. Def should have thought of that.

from scipy.sparse import csc_matrix
from sklearn.utils.testing import all_estimators
import numpy as np
import random
y = np.array([random.randrange(0,2) for i in xrange(1000)])
X = csc_matrix(np.array([[random.randrange(0,2) for i in xrange(1000)], 
                         [random.randrange(0,2) for i in xrange(1000)], 
                         [random.randrange(0,2) for i in xrange(1000)]])).T
for name, Clf in all_estimators(type_filter='classifier'):
    try:
        clf = Clf()
        clf.fit(X,y)
        print name
    except:
        pass

which gave this list:

BernoulliNB
DummyClassifier
KNeighborsClassifier
LabelPropagation
LabelSpreading
LinearSVC
LogisticRegression
MultinomialNB
NearestCentroid
NuSVC
PassiveAggressiveClassifier
Perceptron
RadiusNeighborsClassifier
RidgeClassifier
RidgeClassifierCV
SGDClassifier
SVC

I know this is quick and dirty and misses any that fail for errors other than TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array., and just so everyone knows how conscientious I am, the only one that fails for some other reason is EllipticEnvelope. I checked. :) Also, the non-tree based ensemble methods like BaggingClassifier and AdaBoostClassifier can take sparse input if you change the base_estimator from the default to one that can take sparse input and has all the necessary methods/attributes and you use a sparse representation that can be indexed (csr_matrix or csc_matrix).

Robert E Mealey
  • 506
  • 3
  • 14