0

Hi I am studying AI to build chatbot, i am testing now classification with sklearn, i manage to get good results with following code.

def tuned_nominaldb():
    global Tuned_Pipeline
    pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(analyzer=text_process)),
    ('clf', OneVsRestClassifier(MultinomialNB(
        fit_prior=True, class_prior=None))),
    ])
    parameters = {
        'tfidf__max_df': (0.25, 0.5, 0.75),
        'tfidf__ngram_range': [(1, 1), (1, 2), (1, 3)],
        'clf__estimator__alpha': (1e-2, 1e-3)
    }

    Tuned_Pipeline = GridSearchCV(pipeline, parameters, cv=2, n_jobs=2, verbose=10)
    Tuned_Pipeline.fit(cumle_train, tur_train)

my labels are:

  • Bad Language
  • Politics
  • Religious
  • General

when i enter any sentence i got most of the time correct label as output. but my problem is, i want to get multiple labels like, if i combine bad language and politics, than it only predicts bad language, how can i get multi label like, bad language + Politics.

I tried to add following code, but i got error that string was not expected for fit mothod.

multiout = MultiOutputClassifier(Tuned_Pipeline, n_jobs=-1)
multiout.fit(cumle_train, tur_train)
print(multiout.predict(cumle_test))

Thanks a lot for your help

GurhanCagin
  • 185
  • 2
  • 13
  • 1
    You want multi-label results but I don't think you can simply start outputting multiple classes until your "Y" data is not a single vector but instead has multiple columns (one for each of your four labels). Here's a [good example](http://scikit-learn.org/stable/modules/multiclass.html#multilabel-classification-format). Here's an [answer](https://stackoverflow.com/a/19172087/1577947) that might help. – Jarad Apr 02 '18 at 17:44
  • Hi Jarad, thanks a lot for information, so let me tell what i understand, if you don't train with multi label, then there is no way to combine results with single label train method. i have train data for above topics which i mentioned, but i don't have any for combined. I thought that there would be way like, if sentence score value above some point, you can add that label to output. – GurhanCagin Apr 02 '18 at 18:02
  • Correct. If I'm wrong, I'd be really surprised. The only examples of multi-label output I've seen have always had a "Y" that was usually encoded by MultiLabelBinarizer. I've never seen an example that has some parameter that you set that can automatically "extend" your output variable from being a single predicted value to suddenly multiple values with only training on a single dimensional Y input. – Jarad Apr 02 '18 at 18:11
  • Hi Jarad, if you print Tuned_Pipeline.predict_proba([choice]), where choice is the sentence you entered for testing. it gives you the probability results for each class, result is not enough to combine anything – GurhanCagin Apr 02 '18 at 21:31
  • predict_proba row sums to 1. If this is the type of output you were going for, great! In my opinion, predict_proba is not a substitute for an actual multi-label classification problem because you'd have to determine probability thresholds and you can't just blindly take the to "n" because you might have 3 labels near 0 prob and 1 with high which would give you 1 good label, 1 bad label. It is a creative idea though. – Jarad Apr 02 '18 at 22:50

1 Answers1

0

As you are using the OneVsRestClassifier, it trains one binary classifier for each label used, this implies that you can use multiple estimators in a same sentence and get multiple labels from it. I suggest you check this links:

LeandroHumb
  • 843
  • 8
  • 23