The features importance from scikit -learn pipeline (SVC)

Question

I have the following pipelines, I want to get the features wights with respect to each class. I have three classes ('Fiction','None-fiction','None'). The classifier that I use is SVC.

Book_contents= Pipeline([('selector', ItemSelector(key='Book')),
                         ('tfidf',CountVectorizer(analyzer='word',
                                                  binary=True,
                                                  ngram_range=(1,1))),
                        ])

Author_description= Pipeline([('selector', ItemSelector(key='Description')),
                              ('tfidf', CountVectorizer(analyzer='word',
                                                        binary=True,
                                                        ngram_range=(1,1))),
                             ])

ppl = Pipeline([('feats', FeatureUnion([('Contents',Book_contents),
                                        ('Desc',Author_description)])),
                ('clf', SVC(kernel='linear',class_weight='balanced'))
               ])

model = ppl.fit(training_data, Y_train)

I have tried eli5 but I got error of mismatch between feature name and classifier.

f1=model.named_steps['feats'].transformer_list[0][1].named_steps['tfidf'].get_feature_names()
f2=model.named_steps['feats'].transformer_list[1][1].named_steps['tfidf'].get_feature_names()
    list_features=f1
list_features.append(f2)
explain_weights.explain_linear_classifier_weights(model.named_steps['clf'], 
                                              vec=None, top=20, 
                                              target_names=ppl.classes_, 
                                              feature_names=list_features)

I got this error:

feature_names has a wrong length: expected=47783, got=10528

How to get the rank of features wights with respect to each class? is their a way to do that without eli5?

Please explain what you are doing in this code: `feature_names=model.named_steps['feats'].transformer_list[0][1].named_steps['tfidf'].get_feature_names()`? — Vivek Kumar, Sep 10 '18 at 09:35
Hi @VivekKumar this code to access the steps from the pipeline in-order to get the features but I am not sure if this is the accurate way to do it — Abrial, Sep 10 '18 at 09:44
Thats why I asked. You only accessed the features from first part of FeatureUnion, but not from second — Vivek Kumar, Sep 10 '18 at 09:49
thank you for pointing that out @VivekKumar but even after adding each features I still can not access the wights. Is there another way other than eli5. something with coef_ maybe ? — Abrial, Sep 10 '18 at 09:52
I just edited the post with the required details. Hope it is clear now — Abrial, Sep 10 '18 at 10:10

score 2 · Answer 1 · answered Sep 17 '18 at 12:20

You are doing everything correct except for this line:

list_features.append(f2)

Here you append the whole f2 list as an element to f1 list. This is not what you want.

You want to add all the elements of f2 to f1. For that you need to use extend. Just do this:

list_features.extend(f2)

See this question for more details:

Difference between append vs. extend list methods in Python

In addition to that, I think the way you call explain_weights.explain_linear_classifier_weights is wrong. You just need to call explain_weights(...) and it will automatically internally call the explain_linear_classifier_weights.

He could also do `list_features += f2` – Mindcraft Sep 18 '18 at 01:19 — Mindcraft, Sep 18 '18 at 01:19

The features importance from scikit -learn pipeline (SVC)

1 Answers1