I am currently using scikit-learn to perform classification of news articles and I was wondering which classifier should I use. I have the training set with labelled data, which makes this a supervised learning problem and an article can belong to multiple categories (say finance and politic), making this a multi-label scenario.
I am currently using CountVectorizer
for the preprocessing, then Linear SVC with MultiOutputClassifier to build the model. I use LinearSVC by following the flow chart here http://scikit-learn.org/stable/tutorial/machine_learning_map/index.html.
classifier = MultiOutputClassifier(LinearSVC())
But I am not sure if there is a better algorithm for my use case. Any comments on my approach?