2

I have the following pipelines, I want to get the features wights with respect to each class. I have three classes ('Fiction','None-fiction','None'). The classifier that I use is SVC.

Book_contents= Pipeline([('selector', ItemSelector(key='Book')),
                         ('tfidf',CountVectorizer(analyzer='word',
                                                  binary=True,
                                                  ngram_range=(1,1))),
                        ])

Author_description= Pipeline([('selector', ItemSelector(key='Description')),
                              ('tfidf', CountVectorizer(analyzer='word',
                                                        binary=True,
                                                        ngram_range=(1,1))),
                             ])

ppl = Pipeline([('feats', FeatureUnion([('Contents',Book_contents),
                                        ('Desc',Author_description)])),
                ('clf', SVC(kernel='linear',class_weight='balanced'))
               ])

model = ppl.fit(training_data, Y_train)   

I have tried eli5 but I got error of mismatch between feature name and classifier.

f1=model.named_steps['feats'].transformer_list[0][1].named_steps['tfidf'].get_feature_names()
f2=model.named_steps['feats'].transformer_list[1][1].named_steps['tfidf'].get_feature_names()
    list_features=f1
list_features.append(f2)
explain_weights.explain_linear_classifier_weights(model.named_steps['clf'], 
                                              vec=None, top=20, 
                                              target_names=ppl.classes_, 
                                              feature_names=list_features)

I got this error:

feature_names has a wrong length: expected=47783, got=10528

How to get the rank of features wights with respect to each class? is their a way to do that without eli5?

Abrial
  • 421
  • 1
  • 5
  • 20
  • Please explain what you are doing in this code: `feature_names=model.named_steps['feats'].transformer_list[0][1].named_steps['tfidf'].get_feature_names()`? – Vivek Kumar Sep 10 '18 at 09:35
  • Hi @VivekKumar this code to access the steps from the pipeline in-order to get the features but I am not sure if this is the accurate way to do it – Abrial Sep 10 '18 at 09:44
  • Thats why I asked. You only accessed the features from first part of FeatureUnion, but not from second – Vivek Kumar Sep 10 '18 at 09:49
  • thank you for pointing that out @VivekKumar but even after adding each features I still can not access the wights. Is there another way other than eli5. something with coef_ maybe ? – Abrial Sep 10 '18 at 09:52
  • How are you doing that – Vivek Kumar Sep 10 '18 at 10:02
  • I just edited the post with the required details. Hope it is clear now – Abrial Sep 10 '18 at 10:10

1 Answers1

2

You are doing everything correct except for this line:

list_features.append(f2)

Here you append the whole f2 list as an element to f1 list. This is not what you want.

You want to add all the elements of f2 to f1. For that you need to use extend. Just do this:

list_features.extend(f2)

See this question for more details:

In addition to that, I think the way you call explain_weights.explain_linear_classifier_weights is wrong. You just need to call explain_weights(...) and it will automatically internally call the explain_linear_classifier_weights.

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132