0

I have a text like,

"India, officially the Republic of India is a country in South Asia."

I need it to be able to give me,

Country: India
Region: South Asia

Whatever I found on scikit's documentation, it was able to classify it into one category, for example I could train this on two classifier to check whether a country is present or region is present, but I want it to tell me which feature it is picking up, bit like NLTK's most informative features, for each classification. How do I do this?

n00b
  • 1,549
  • 2
  • 14
  • 33
  • what algorithm are you using to train classifiers? – Ali Jan 19 '16 at 08:59
  • @alivar I'm thinking of using SVM – n00b Jan 19 '16 at 09:16
  • Take a look at [this question](https://stackoverflow.com/questions/10526579/use-scikit-learn-to-classify-into-multiple-categories) You can also check multilabel classification from [sckit documentation](http://scikit-learn.org/stable/auto_examples/plot_multilabel.html#sphx-glr-auto-examples-plot-multilabel-py) in particular the [OneVsRest](http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html#sklearn.multiclass.OneVsRestClassifier) classiffier, I think this is what you are looking for... – Rodrigo Laguna Feb 06 '18 at 15:27

1 Answers1

0

If you use SVM this question at cross validated may get you started. The idea is to interpret the classification weights, but that is not trivial.

Personally, I prefer to use a RandomForestClassifier, which has feature ranking built in. It's exposed by the feature_importances_ attribute. There is even an example at the scikit-learn documentation.

Community
  • 1
  • 1
MB-F
  • 22,770
  • 4
  • 61
  • 116