As opposed to just trying them with sparse input and getting an error. Something like the listing of classifiers shown here: AdaBoostClassifier with different base learners
-
2You can use a similar method and ``try`` to fit them with sparse data. – Andreas Mueller Nov 10 '14 at 16:44
1 Answers
ok, answering my own question for posterity, adapting original post with try
as suggested by andreas. Def should have thought of that.
from scipy.sparse import csc_matrix
from sklearn.utils.testing import all_estimators
import numpy as np
import random
y = np.array([random.randrange(0,2) for i in xrange(1000)])
X = csc_matrix(np.array([[random.randrange(0,2) for i in xrange(1000)],
[random.randrange(0,2) for i in xrange(1000)],
[random.randrange(0,2) for i in xrange(1000)]])).T
for name, Clf in all_estimators(type_filter='classifier'):
try:
clf = Clf()
clf.fit(X,y)
print name
except:
pass
which gave this list:
BernoulliNB
DummyClassifier
KNeighborsClassifier
LabelPropagation
LabelSpreading
LinearSVC
LogisticRegression
MultinomialNB
NearestCentroid
NuSVC
PassiveAggressiveClassifier
Perceptron
RadiusNeighborsClassifier
RidgeClassifier
RidgeClassifierCV
SGDClassifier
SVC
I know this is quick and dirty and misses any that fail for errors other than TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
, and just so everyone knows how conscientious I am, the only one that fails for some other reason is EllipticEnvelope
. I checked. :) Also, the non-tree based ensemble methods like BaggingClassifier
and AdaBoostClassifier
can take sparse input if you change the base_estimator from the default to one that can take sparse input and has all the necessary methods/attributes and you use a sparse representation that can be indexed (csr_matrix
or csc_matrix
).

- 506
- 3
- 14